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Abstract 


A  New  0(n2)  Algorithm  for  the  Symmetric  Tridiagonal  Eigenvalue/Eigenvector 

Problem 

by 

Inderjit  Singh  Dhillon 
Doctor  of  Philosophy  in  Computer  Science 

University  of  California,  Berkeley 
Professor  James  W.  Demmel,  Chair 

Computing  the  eigenvalues  and  orthogonal  eigenvectors  of  an  n  X  n  symmetric  tridiagonal 
matrix  is  an  important  task  that  arises  while  solving  any  symmetric  eigenproblem.  All 
practical  software  requires  0(n3)  time  to  compute  all  the  eigenvectors  and  ensure  their 
orthogonality  when  eigenvalues  are  close.  In  the  first  part  of  this  thesis  we  review  earlier 
work  and  show  how  some  existing  implementations  of  inverse  iteration  can  fail  in  surprising 
ways. 

The  main  contribution  of  this  thesis  is  a  new  0(n 2),  easily  parallelizable  algorithm 
for  solving  the  tridiagonal  eigenproblem.  Three  main  advances  lead  to  our  new  algorithm. 
A  tridiagonal  matrix  is  traditionally  represented  by  its  diagonal  and  off-diagonal  elements. 
Our  most  important  advance  is  in  recognizing  that  its  bidiagonal  factors  are  “better”  for 
computational  purposes.  The  use  of  bidiagonals  enables  us  to  invoke  a  relative  criterion  to 
judge  when  eigenvalues  are  “close”.  The  second  advance  comes  by  using  multiple  bidiag¬ 
onal  factorizations  in  order  to  compute  different  eigenvectors  independently  of  each  other. 
Thirdly,  we  use  carefully  chosen  dqds-like  transformations  as  inner  loops  to  compute  eigen- 
pairs  that  are  highly  accurate  and  “faithful”  to  the  various  bidiagonal  representations. 
Orthogonality  of  the  eigenvectors  is  a  consequence  of  this  accuracy.  Only  O(n)  work  per 
eigenpair  is  needed  by  our  new  algorithm. 

Conventional  wisdom  is  that  there  is  usually  a  trade-off  between  speed  and  accu¬ 
racy  in  numerical  procedures,  i.e. ,  higher  accuracy  can  be  achieved  only  at  the  expense  of 
greater  computing  time.  An  interesting  aspect  of  our  work  is  that  increased  accuracy  in 
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the  eigenvalues  and  eigenvectors  obviates  the  need  for  explicit  orthogonalization  and  leads 
to  greater  speed. 

We  present  timing  and  accuracy  results  comparing  a  computer  implementation 
of  our  new  algorithm  with  four  existing  EISPACK  and  LAPACK  software  routines.  Our 
test-bed  contains  a  variety  of  tridiagonal  matrices,  some  coming  from  quantum  chemistry 
applications.  The  numerical  results  demonstrate  the  superiority  of  our  new  algorithm.  For 
example,  on  a  matrix  of  order  966  that  occurs  in  the  modeling  of  a  biphenyl  molecule 
our  method  is  about  10  times  faster  than  LAPACK’s  inverse  iteration  on  a  serial  IBM 
RS/6000  processor  and  nearly  100  times  faster  on  a  128  processor  IBM  SP2  parallel  machine. 


Professor  James  W.  Demmel 
Dissertation  Committee  Chair 


To  my  parents, 


for  their  constant  love  and  support. 
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Chapter  1 

Setting  the  Scene 


In  this  thesis,  we  propose  a  new  algorithm  for  finding  all  or  a  subset  of  the  eigenval¬ 
ues  and  eigenvectors  of  a  symmetric  tridiagonal  matrix.  The  main  advance  is  in  being  able 
to  compute  numerically  orthogonal  “eigenvectors”  without  taking  recourse  to  the  Gram- 
Schmidt  process  or  a  similar  technique  that  explicitly  orthogonalizes  vectors.  All  existing 
software  for  this  problem  needs  to  do  such  orthogonalization  and  hence  takes  0(n3)  time 
in  the  worst  case,  where  n  is  the  order  of  the  matrix.  Our  new  algorithm  is  the  result  of 
several  innovations  which  enable  us  to  compute,  in  0(n2)  time,  eigenvectors  that  are  highly 
accurate  and  numerically  orthogonal  as  a  consequence.  We  believe  that  the  ideas  behind 
our  new  algorithm  can  be  gainfully  applied  to  several  other  problems  in  numerical  linear 
algebra. 

As  an  example  of  the  speedups  possible  due  to  our  new  algorithm,  the  parallel 
solution  of  a  966  X  966  dense  symmetric  eigenproblem,  that  comes  from  the  modeling  of  a 
biphenyl  molecule  by  the  Mpller-Plesset  theory,  is  now  nearly  3  times  faster  than  an  earlier 
implementation  [39].  This  speedup  is  a  direct  consequence  of  a  10-fold  increase  in  speed  of 
the  tridiagonal  solution,  which  previously  accounted  for  80-90%  of  the  total  time.  Detailed 
numerical  results  are  presented  in  Chapter  6. 

Before  we  sketch  an  outline  of  our  thesis,  we  list  the  features  of  an  “ideal”  algo¬ 


rithm. 
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1.1  Our  Goals 

At  the  onset  of  our  research  in  1993,  we  listed  the  desirable  properties  of  the 
“ultimate”  algorithm  for  computing  the  eigendecomposition  of  the  symmetric  tridiagonal 
matrix  T.  Our  wish-list  was  for 

1.  An  0(n2)  algorithm.  Such  an  algorithm  would  achieve  the  minimum  output  com¬ 
plexity  in  computing  all  the  eigenvectors. 

2.  An  algorithm  that  guarantees  accuracy.  Due  to  the  limitations  of  finite  preci¬ 
sion,  we  cannot  hope  to  compute  the  true  eigenvalues  and  ensure  that  the  computed 
eigenvectors  are  exactly  orthogonal.  A  plausible  goal  is  to  find  approximate  eigenpairs 
(Aj-,  Vi),  vf  Vi  =  1,  i  =  1,  2, . . . ,  n,  such  that 

•  the  residual  norms  are  small,  i.e. , 

H(ToA8/)A||  =  0(e||r||),  (1.1.1) 

•  the  computed  vectors  are  numerically  orthogonal ,  i.e., 

=  0(e),  i^j,  (1.1.2) 

where  e  is  the  machine  precision.  For  some  discussion  on  how  to  relate  the  above 
goals  to  backward  errors  in  T,  see  [87,  Theorem  2.1]  and  [71].  However  it  may  not 
be  possible  to  achieve  (1.1.1)  and  (1.1.2)  in  all  cases  and  we  will  aim  for  bounds  that 
grow  slowly  with  n. 

3.  An  embarrassingly  parallel  algorithm  that  allows  independent  computation  of 
each  eigenvalue  and  eigenvector  making  it  easy  to  implement  on  a  parallel  computer. 

4.  An  adaptable  algorithm  that  permits  computation  of  any  k  of  the  eigenvalues  and 
eigenvectors  at  a  reduced  cost  of  0{nk)  operations. 

We  have  almost  succeeded  in  accomplishing  all  these  lofty  goals.  The  algorithm 
presented  at  the  end  of  Chapter  5  is  an  0(n2),  embarrassingly  parallel  and  adaptable  method 
that  passes  all  our  numerical  tests.  Work  to  further  improve  this  algorithm  is  still  ongoing 
and  we  believe  we  are  very  close  to  a  provably  accurate  algorithm. 
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1.2  Outline  of  Thesis 

The  following  summarizes  the  contents  of  this  thesis. 

1.  In  Chapter  2,  we  give  some  background  explaining  the  importance  of  the  symmetric 
tridiagonal  eigenproblem.  We  then  briefly  describe  some  of  the  existing  methods  — 
the  popular  QR  algorithm,  bisection  followed  by  inverse  iteration  and  the  relatively 
new  divide  and  conquer  approach.  In  Section  2.6,  we  compare  the  existing  methods  in 
terms  of  speed,  accuracy  and  memory  requirements  discussing  how  they  do  not  satisfy 
all  the  desirable  goals  of  Section  1.1.  Since  our  new  algorithm  is  related  to  inverse 
iteration,  in  Section  2.7  we  discuss  in  detail  the  tricky  issues  involved  in  its  computer 
implementation.  In  Section  2.8.1,  we  show  how  the  existing  LAPACK  and  EISPACK 
inverse  iteration  software  can  fail.  The  expert  reader  may  skip  this  chapter  and  move 
on  to  Chapter  3  where  we  start  describing  our  new  methods. 

2.  In  Chapter  3,  we  describe  in  detail  the  computation  of  an  eigenvector  corresponding 
to  an  isolated  eigenvalue  that  has  already  been  computed.  We  begin  by  showing  how 
some  of  the  obvious  ways  can  fail  miserably  due  to  the  tridiagonal  structure.  Twisted 
factorizations,  introduced  in  Section  3.1,  are  found  to  reveal  singularity  and  provide 
an  elegant  mechanism  for  computing  an  eigenvector.  The  method  suggested  by  these 
factorizations  may  be  thought  of  as  deterministically  “picking  a  good  starting  vector” 
for  inverse  iteration  thus  avoiding  the  random  choices  currently  used  in  LAPACK 
and  solving  a  question  posed  by  Wilkinson  in  [136,  p.318].  Section  3.4  shows  how  to 
modify  this  new  method  in  order  to  eliminate  divisions.  We  then  digress  a  little  and 
in  Sections  3.5  and  3.6  briefly  discuss  how  twisted  factorizations  can  be  employed  to 
reveal  the  rank  of  denser  matrices  and  guarantee  deflation  in  perfect  shift  strategies. 

3.  In  Chapter  4,  we  show  how  to  independently  compute  eigenvectors  that  turn  out  to 
be  numerically  orthogonal  when  eigenvalues  differ  in  most  of  their  digits.  Note  that 
eigenvalues  may  have  tiny  absolute  gaps  without  agreeing  in  any  digit,  e.g.,  10-50  and 
10-51  are  far  apart  from  each  other  in  a  relative  sense.  We  say  that  such  eigenvalues 
have  large  relative  gaps.  The  material  of  this  chapter  represents  a  major  advance 
towards  our  goal  of  an  0(n2)  algorithm.  Sections  4.1  and  4.2  extol  the  benefits  of  high 
accuracy  in  the  computed  eigenvalues,  and  show  that  for  computational  purposes,  a 
bidiagonal  factorization  of  T  is  “better”  than  the  traditional  way  of  representing  T  by 


4 


its  diagonal  and  off-diagonal  elements.  In  Section  4.3,  we  review  the  known  properties 
of  bidiagonal  matrices  that  make  them  attractive  for  computation.  Section  4.4.1  gives 
the  qd-like  recurrences  that  allow  us  to  exploit  the  good  properties  of  bidiagonals, 
and  in  Section  4.4.2  we  give  a  detailed  roundoff  error  analysis  of  their  computer 
implementations.  Section  4.5  gives  a  rigorous  analysis  that  proves  the  numerical 
orthogonality  of  the  computed  eigenvectors  when  relative  gaps  are  large.  To  conclude, 
we  present  a  few  numerical  results  in  Section  4.6  to  verify  the  above  claims. 

4.  Chapter  5  deals  with  the  case  when  relative  gaps  between  the  eigenvalues  are  small. 
In  this  chapter,  we  propose  that  for  each  such  cluster  of  eigenvalues,  we  form  an 
additional  bidiagonal  factorization  of  T  -\-  pi  where  p  is  close  to  the  cluster,  and  then 
apply  the  techniques  of  Chapter  4  to  compute  eigenvectors  that  are  automatically 
numerically  orthogonal.  The  success  of  this  approach  depends  on  finding  relatively 
robust  representations  that  are  defined  in  Section  5.2.  Section  5.3.1  introduces  the 
concept  of  a  representation  tree  which  is  a  tool  that  facilitates  proving  orthogonality 
of  the  vectors  computed  using  different  representations.  We  present  Algorithm  Y  in 
Section  5.4  that  also  handles  the  remaining  case  of  small  relative  gaps.  We  cannot 
prove  the  correctness  of  this  algorithm  as  yet  but  extensive  numerical  experience 
indicates  that  it  is  accurate. 

5.  In  Chapter  6  we  give  a  detailed  numerical  comparison  between  Algorithm  Y  and  four 
existing  EISPACK  and  LAPACK  software  routines.  Section  6.4.1  describes  our  ex¬ 
tensive  collection  of  test  tri diagonals,  some  of  which  come  from  quantum  chemistry 
applications.  We  find  our  computer  implementation  of  Algorithm  Y  to  be  uniformly 
faster  than  existing  implementations  of  inverse  iteration  and  the  QR  algorithm.  The 
speedups  range  from  factors  of  4  to  about  3500  depending  on  the  eigenvalue  distribu¬ 
tion.  In  Section  6.5,  we  speculate  on  further  improvements  to  Algorithm  Y. 

6.  FinaUy,  in  Chapter  7,  we  discuss  how  some  of  our  techniques  may  be  applicable  to 
other  problems  in  numerical  hnear  algebra. 

In  addition  to  the  above  chapters,  we  have  included  some  case  studies  at  the  end 
of  this  thesis,  which  examine  various  illuminating  examples  in  detail.  Much  of  the  material 
in  the  case  studies  appears  scattered  in  various  chapters,  but  we  have  chosen  to  collate  and 
expand  on  it  at  the  end,  where  it  can  be  read  independently  of  the  main  text. 
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1.3  Notation 

We  now  say  a  few  words  about  our  notation.  The  reader  would  benefit  by  occa¬ 
sionally  reviewing  this  section  during  the  course  of  his/her  reading  of  this  thesis.  We  adopt 
Householder’s  conventions  and  denote  matrices  by  uppercase  roman  letters  such  as  A,  B, 
«/,  T  and  scalars  by  lowercase  greek  letters  a ,  /3,  7,  7,  or  lowercase  italic  such  as  a4-,  bi 
etc.  We  also  try  to  follow  the  Kahan/Parlett  convention  of  denoting  symmetric  matrices 
by  symmetric  letters  such  as  A,  T  and  nonsymmetric  matrices  by  B,  .J  etc.  In  particular, 
T  stands  for  a  symmetric  tridiagonal  matrix  while  J  denotes  a  nonsymmetric  tridiagonal. 
However,  we  will  occasionally  depart  from  these  principles,  for  example,  the  letters  L  and  U 
will  denote  lower  and  upper  triangular  matrices  respectively  while  D  stands  for  a  diagonal 
matrix  by  strong  tradition.  Overbars  will  be  frequently  used  when  more  than  one  matrix  of 
a  particular  type  is  being  considered,  e.g.,  L  and  L.  The  submatrix  of  T  in  rows  i  through 
j  will  be  denoted  by  Tt:j  and  its  characteristic  polynomial  by  xv'3  ■ 

We  denote  vectors  by  lowercase  roman  letters  such  as  u,  v  and  z.  The  ith  com¬ 
ponent  of  the  vector  v  will  be  denoted  by  ty  or  v(i).  The  ( i,j )  element  of  matrix  A  will  be 
denoted  by  Aij.  All  vectors  are  n-dimensional  and  all  matrices  are  n  X  n  unless  otherwise 
stated.  In  cases  where  there  are  at  most  n  non-trivial  entries  in  a  matrix,  we  will  use  only 
one  index  to  denote  these  matrix  entries,  for  example  L(i)  might  denote  the  L(i-\- 1,  i)  entry 
of  a  unit  lower  bidiagonal  matrix  while  D(i)  or  Di  can  denote  the  (i,  i)  element  of  a  diagonal 
matrix.  The  ith  column  vector  of  the  matrix  V  will  be  denoted  by  ty.  We  note  that  this 
might  lead  to  some  ambiguity,  but  we  will  try  and  explicitly  state  our  notation  before  such 
usage. 

We  denote  the  n  eigenvalues  of  a  matrix  by  Ai,  A2,  •  •  • ,  Xn,  while  the  n  singular 
values  are  denoted  by  ay ,  a?,  ■  ■  ■ ,  on.  Normally  we  will  assume  that  these  quantities  are 
ordered,  i.e. ,  Ai  <  •  •  •  <  Xn  while  ay  >  •  •  •  >  an.  Note  that  the  eigenvalues  are  arranged 
in  increasing  order  while  the  singular  values  are  in  decreasing  order.  We  have  done  this  to 
abide  by  existing  conventions.  The  ordering  is  immaterial  to  most  of  our  presentation,  but 
we  will  make  explicit  the  order  of  arrangement  whenever  our  exposition  requires  ordering. 
Eigenvectors  and  singular  vectors  will  be  denoted  by  ty  and  U{.  The  diagonal  matrix  of 
eigenvalues  and  singular  values  will  be  denoted  by  A  and  S  respectively,  while  V  and  U  will 
stand  for  matrices  whose  columns  are  eigenvectors  and/or  singular  vectors. 

Since  finite  precision  computations  are  the  driving  force  behind  our  work,  we 
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briefly  introduce  our  model  of  arithmetic.  We  assume  that  the  floating  point  result  of  a 
basic  arithmetic  operation  o  satisfies 

fl(x  o  y)  =  (x  o  y)(  1  +  7])  =  (xo  y)/(  1  +  6) 

where  r]  and  S  depend  on  x,  y ,  o,  and  the  arithmetic  unit.  The  relative  errors  satisfy 

\r]\  <  e,  |tf|  <  e 

for  a  given  e  that  depends  only  on  the  arithmetic  unit  and  will  be  called  the  machine 
precision.  We  shall  choose  freely  the  form  ( r]  or  S)  that  suits  the  analysis.  We  also  adopt 
the  convention  of  denoting  the  computed  value  of  x  by  x.  In  fact,  we  have  already  used 
some  of  this  notation  in  Section  1.1. 

The  IEEE  single  and  double  precision  formats  allow  for  24  and  53  bits  of  precision 
respectively.  Thus  the  corresponding  e  are  2-23  ss  1.2  X  10-7  and  2-52  ss  2.2  X  10-16 
respectively.  Whenever  e  occurs  in  our  analysis  it  will  either  denote  machine  precision 
or  should  be  taken  to  mean  “of  the  order  of  magnitude  of”  the  machine  precision,  i.e, 
e  =  O (machine  precision). 

Just  as  we  did  in  the  above  sentence  and  in  equations  (1.1.1)  and  (1.1.2),  we 
will  continue  to  abuse  the  “big  oh”  notation.  Normally  the  O  notation,  introduced  by 
Bachmann  in  1894  [68,  Section  9.2],  implies  a  limiting  process.  For  example,  when  we  say 
that  an  algorithm  takes  0{n2)  time,  we  mean  that  the  algorithm  performs  less  than  Kn2 
operations  for  some  constant  K  as  n  oo.  However  in  our  informal  discussions,  sometimes 
there  will  not  be  any  limiting  process  or  we  may  not  always  make  it  precise.  In  the  former 
case,  0  will  be  a  synonym  for  “of  the  order  of  magnitude  of”.  Our  usage  should  be  clear 
from  the  context.  Of  course,  we  will  be  precise  in  the  statements  of  our  theorems  —  in  fact, 
the  0  notation  does  not  appear  in  any  theorem  or  proof  in  this  thesis. 

We  will  also  be  sloppy  in  our  usage  of  the  terms  “eigenvalues”  and  “eigenvec¬ 
tors”.  An  unreduced  symmetric  tridiagonal  matrix  has  exactly  n  distinct  eigenvalues  and 
n  normalized  eigenvectors  that  are  mutually  orthogonal.  However,  in  several  places  we 
will  use  phrases  hke  “the  computed  eigenvalues  are  close  to  the  exact  eigenvalues”  and 
“the  computed  eigenvectors  are  not  numerically  orthogonal”.  In  these  phrases,  we  refer  to 
approximations  to  the  eigenvalues  and  eigenvectors  and  we  are  deliberately  sloppy  for  the 
sake  of  brevity.  In  a  similar  vein,  we  will  use  “orthogonal”  to  occasionally  mean  numerically 
orthogonal,  i.e.,  orthogonal  to  working  precision. 
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Chapter  2 

Existing  Algorithms  &  their 
Drawbacks 


In  this  chapter,  we  start  by  giving  a  quick  background  to  the  problem  of  computing 
an  eigendecomposition  of  a  dense  symmetric  matrix  for  the  benefit  of  a  newcomer.  We 
then  discuss  and  compare  existing  methods  of  solving  the  resulting  tridiagonal  problem  in 
Sections  2.2  through  2.6.  Later,  in  Sections  2.7  and  2.8,  we  show  how  various  issues  that  arise 
in  implementing  inverse  iteration  are  handled  in  existing  LAPACK  [1]  and  EISPACK  [128] 
software,  and  present  some  examples  where  they  fail  to  deliver  correct  answers.  Finally,  we 
sketch  our  alternate  approach  on  handling  these  issues  in  Section  2.9. 

2.1  Background 

Eigenvalue  computations  arise  in  a  rich  variety  of  contexts.  A  quantum  chemist 
may  compute  eigenvalues  to  reveal  the  electronic  energy  states  in  a  large  molecule,  a  struc¬ 
tural  engineer  may  need  to  construct  a  bridge  whose  natural  frequencies  of  vibration  lie 
outside  the  earthquake  band  while  eigenvalues  may  convey  information  about  the  stabil¬ 
ity  of  a  market  to  an  economist.  A  large  number  of  such  physically  meaningful  problems 
may  be  posed  as  the  abstract  mathematical  problem  of  Ending  all  numbers  A  and  non-zero 
vectors  q  that  satisfy  the  equation 

Aq  =  A  q, 

where  A  is  a  real,  symmetric  matrix  of  order  n.  A  is  cahed  an  eigenvalue  of  the  matrix  A 
while  q  is  a  corresponding  eigenvector. 
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All  eigenvalues  of  A  must  satisfy  the  characteristic  equation,  det(AoAJ)  =  0  and 
since  the  left  hand  side  of  this  equation  is  a  polynomial  in  A  of  degree  n,  A  has  exactly  n 
eigenvalues.  A  symmetric  matrix  further  enjoys  the  properties  that 

1.  all  eigenvalues  are  real,  and 

2.  a  complete  set  of  n  mutually  orthogonal  eigenvectors  may  be  chosen. 

Thus  a  symmetric  matrix  A  admits  the  eigendecomposition 

A  =  QAQt, 

where  A  is  a  diagonal  matrix,  A  =  diag(Ai,  A2, . . . ,  Xn),  and  Q  is  an  orthogonal  matrix, 
i.e. ,  QtQ  =  /.  In  the  case  of  an  unreduced  symmetric  tridiagonal  matrix,  i.e. ,  where  all 
off-diagonal  elements  of  the  tridiagonal  are  non-zero,  the  eigenvalues  are  distinct  while  the 
eigenvectors  are  unique  up  to  a  scale  factor  and  are  mutually  orthogonal. 

Armed  with  this  knowledge,  several  algorithms  for  computing  the  eigenvalues  of 
a  real,  symmetric  matrix  have  been  constructed.  Prior  to  the  1950s,  explicitly  forming 
and  solving  the  characteristic  equation  seems  to  have  been  a  popular  choice.  However, 
eigenvalues  are  extremely  sensitive  to  small  changes  in  the  coefficients  of  the  characteristic 
polynomial  and  the  inadequacy  of  this  representation  became  clear  with  the  advent  of 
the  modern  digital  computer.  Orthogonal  matrices  became,  and  still  remain,  the  most 
important  tools  of  the  trade.  A  sequence  of  orthogonal  transformations, 

Aq  =  A,  A8'_|_i  =  QjAiQi, 

where  QfQi  =  /,  is  numerically  stable  and  preserves  the  eigenvalues  of  A{.  Many  algorithms 
for  computing  the  eigendecomposition  of  A  attempt  to  construct  such  a  sequence  so  that 
Ai  converges  to  a  diagonal  matrix.  But,  Galois’  work  in  the  nineteenth  century  implies  that 
for  n  >  4  there  can  be  no  finite  m  for  which  Am  is  diagonal  as  long  as  the  Qi  are  computed 
by  algebraic  expressions  and  taking  kth  roots.  It  seems  natural  to  try  and  transform  A  to 
a  tridiagonal  matrix  instead.  In  1954,  Givens  proposed  the  reduction  of  A  to  tridiagonal 
form  by  using  orthogonal  plane  rotations  [62].  However,  most  current  efficient  algorithms 
work  by  reducing  A  to  a  tridiagonal  matrix  T  by  a  sequence  of  n  02  orthogonal  reflectors, 
now  named  after  Householder  who  first  introduced  them  in  1958,  see  [81].  Mathematically, 


T  =  (QTri_z---QlQl)A(Q0Q1---Qn_z)  =  ZT  AZ 
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The  eigendecomposition  of  T  may  now  be  found  as 

T  =  VAVT,  (2.1.1) 

where  VTV  =  /,  and  back-transformation  may  be  used  to  find  the  eigenvectors  of  A, 

A  =  (ZV)A(ZV)t  =  QAQt. 

The  tridiagonal  eigenproblem  is  one  of  the  most  intensively  studied  problems  in  numerical 
linear  algebra.  A  variety  of  methods  exploit  the  tridiagonal  structure  to  compute  (2.1.1). 
Extensive  research  has  led  to  plenty  of  software,  especially  in  the  linear  algebra  software 
libraries,  EISPACK  [128]  and  the  more  recent  LAPACK  [1],  We  will  examine  existing 
algorithms  and  related  software  in  the  next  section. 

We  now  briefly  discuss  the  relative  costs  of  the  various  components  involved  in 
solving  the  dense  symmetric  eigenproblem.  Reducing  A  to  tridiagonal  form  by  Householder 
transformations  costs  about  | n3  multiplications  and  additions,  while  back-transformation  to 
get  the  eigenvectors  of  A  from  the  tridiagonal  solution  needs  2 n3  multiplication  and  addition 
operations.  The  cost  of  solving  the  tri diagonal  eigenproblem  varies  according  to  the  method 
used  and  the  numerical  values  in  the  matrix.  If  the  distribution  of  eigenvalues  is  favorable 
some  methods  may  solve  such  a  tridiagonal  problem  in  0(n2)  time.  However,  all  existing 
software  takes  kn 3  operations  in  the  worst  case,  where  k  is  a  modest  number  that  can  vary 
from  4  to  12.  In  many  cases  of  practical  importance,  the  tridiagonal  problem  can  indeed  be 
the  bottleneck  in  the  total  computation.  The  extent  to  which  it  is  a  bottleneck  can  be  much 
greater  than  suggested  by  the  above  numbers  because  the  other  two  phases,  Householder 
reduction  and  back-transformation  can  exploit  fast  matrix-multiply  based  operations  [12, 
45],  whereas  most  algorithms  for  the  tri  diagonal  problem  are  sequential  in  nature  and/or 
cannot  be  expressed  in  terms  of  matrix  multiplication.  We  now  discuss  these  existing 
algorithms. 

2.2  The  QR  Algorithm 

Till  recently,  the  method  of  choice  for  the  symmetric  tridiagonal  eigenproblem  was 
the  QR  Algorithm  which  was  independently  invented  by  Francis  [59]  and  Kublanovskaja  [95]. 
The  QR  method  is  a  remarkable  iteration  process, 

T0  =  T, 
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Ti<$t tJ  =  QiRi ,  (2.2.2) 

R+ 1  =  RiQi  +  (JiR  i  =  0,1,2,... 

where  QfQi  =  R  Ri  is  upper  triangular  and  <7;  is  a  shift  chosen  to  accelerate  convergence. 
The  off-diagonal  elements  of  R  are  rapidly  driven  to  zero  by  this  process.  Francis,  helped 
by  Strachey  and  Wilkinson,  was  the  first  to  note  the  invariance  of  the  tridiagonal  form  and 
incorporate  shifts  in  the  QR  method.  These  observations  make  the  method  computationally 
viable.  The  initial  success  of  the  method  sparked  off  an  incredible  amount  of  research  into 
the  QR  method,  which  carries  on  till  this  day.  Several  shift  strategies  were  proposed  and 
convergence  properties  of  the  method  studied.  The  ultimate  cubic  convergence  of  the  QR 
algorithm  with  suitable  shift  strategies  was  observed  by  both  Kahan  and  Wilkinson.  In  1968, 
Wilkinson  proved  that  the  tridiagonal  QL  iteration  (the  QL  method  is  intimately  related 
to  the  QR  method)  always  converges  using  his  shift  strategy.  A  simpler  proof  of  global 
convergence  is  due  to  Hoffman  and  Parlett,  see  [79].  An  excellent  treatment  of  the  QR 
method  is  given  by  Parlett  in  [110]. 

Each  orthogonal  matrix  Qi  in  (2.2.2)  is  a  product  of  n  Ol  elementary  rotations, 
known  as  Givens  rotations  [62].  The  tridiagonal  matrices  R  converge  to  diagonal  form 
that  gives  the  eigenvalues  of  T.  The  eigenvector  matrix  of  T  is  then  given  by  the  product 
Q1Q2Q3  •  •  ••  When  only  eigenvalues  are  desired,  the  QR  transformation  can  be  reorganized 
to  ehminate  all  square  roots  that  are  required  to  form  the  Givens  rotations.  This  was  first 
observed  by  Ortega  and  Kaiser  in  1963  [109]  and  a  fast,  stable  algorithm  was  developed 
by  Pal,  Walker  and  Kahan  (PWK)  in  1968-69  [110].  Since  a  square  root  operation  can  be 
about  20  or  more  times  as  expensive  as  addition  or  multiplication,  this  yields  a  much  faster 
method.  In  particular,  the  PWK  algorithm  finds  all  eigenvalues  of  T  in  approximately  9 n2 
multiply  and  add  operations  and  3 n2  divisions,  with  the  assumption  that  2  QR  iterations 
are  needed  per  eigenvalue.  However,  when  eigenvectors  are  desired,  the  product  of  all  the 
Qi  must  be  accumulated  during  the  algorithm.  The  0(n2)  square  root  operations  cannot  be 
eliminated  in  this  process  and  approximately  6 n3  multiplications  and  additions  are  needed 
to  find  all  the  eigenvalues  and  eigenvectors  of  T.  In  the  hope  of  cutting  down  this  work 
by  half,  Parlett  suggested  the  alternate  strategy  of  computing  the  eigenvalues  by  the  PWK 
algorithm,  and  then  executing  the  QR  algorithm  using  the  previously  computed  eigenvalues 
as  origin  shifts  to  find  the  eigenvectors  [110,  p.173].  However  this  perfect  shift  strategy  was 
not  found  to  work  much  better  than  Wilkinson’s  shift  strategy,  taking  an  average  of  about 
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2  QR  iterations  per  eigenvalue  [69,  98]. 

2.3  Bisection  and  Inverse  Iteration 

In  1954,  Wallace  Givens  proposed  the  method  of  bisection  to  find  some  or  all  of 
the  eigenvalues  of  a  real,  symmetric  matrix  [62].  This  method  is  based  on  the  availability  of 
a  simple  recurrence  to  count  the  number  of  eigenvalues  less  than  a  floating  point  number  fi. 

Let  Xj(T)  =  det(r/oT1:j)  be  the  characteristic  polynomial  of  the  leading  principal 
j  X  j  submatrix  of  T.  The  sequence  {xo,  Xi,  •  •  • ,  Xn}i  where  y0  =  1  and  xi  =  M  44-Tn,  forms 
a  Sturm  sequence  of  polynomials.  The  tridiagonal  nature  of  T  allows  computation  of  Xjid) 
using  the  three-term  recurrence 

Xj+ i(aO  =  (M  4t>Tj_|_ij'_|_i)yj(/r)  oTj_|_lj-yj_i(^).  (2.3.3) 

The  number  of  sign  agreements  in  consecutive  terms  of  the  numerical  sequence 
{Xi(p),i  =  0, 1, . . .  ,n}  equals  the  number  of  eigenvalues  of  T  less  than  fi.  Based  on  this 
recurrence,  Givens  devised  a  method  that  repeatedly  halves  the  size  of  an  interval  that 
contains  at  least  one  eigenvalue.  However,  it  was  soon  observed  that  recurrence  (2.3.3)  was 
prone  to  overflow  with  a  limited  exponent  range.  An  alternate  recurrence  that  computes 
dj(p)  =  Xj(d)/Xj- i(m)  is  n°w  used  in  most  software  [89], 

dj+ i(f0  —  (m  j+i )  /dj(^r) ,  di(^i)  —  n  XXT\\. 

The  bisection  algorithm  permits  an  eigenvalue  to  be  computed  in  about  2 bn  addi¬ 
tion  and  bn  division  operations  where  b  is  the  number  of  bits  of  precision  in  the  numbers 
( b  =  24  in  IEEE  single  while  b  =  53  in  IEEE  double  precision  arithmetic).  Thus  all  eigenval¬ 
ues  may  be  found  in  0(bn2)  operations.  Faster  iterations  that  are  superhnearly  convergent 
can  beat  bisection  and  we  give  some  references  in  Section  2.5. 

Once  an  accurate  eigenvalue  approximation  A  is  known,  the  method  of  inverse 
iteration  may  be  used  to  compute  an  approximate  eigenvector  [118,  87]  : 

VW  =  b,  (4«i/)t(i+1>  =  *  =  0,1,2,..., 

where  b  is  the  starting  vector  and  7"W  is  a  scalar. 

Earlier  fears  about  loss  of  accuracy  in  solving  the  linear  system  given  above  due  to 
the  near  singularity  of  T  oA /  were  allayed  in  [119].  Inverse  iteration  delivers  a  vector  v  that 
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has  a  small  residual,  i.e.  small  ||(T oA/)D||,  whenever  A  is  close  to  A.  However  small  residual 
norms  do  not  guarantee  orthogonality  of  the  computed  vectors  when  eigenvalues  are  close 
together.  A  commonly  used  “remedy”  for  clusters  of  eigenvalues  is  to  orthogonalize  each  ap¬ 
proximate  eigenvector,  as  soon  as  it  is  computed,  against  previously  computed  eigenvectors 
in  the  cluster.  Typical  implementations  orthogonalize  using  the  modified  Gram-Schmidt 
method. 

The  amount  of  work  required  by  inverse  iteration  to  compute  all  the  eigenvectors 
of  a  symmetric  tridiagonal  matrix  strongly  depends  upon  the  distribution  of  eigenvalues. 
If  eigenvalues  are  well-separated,  then  0(n2)  operations  are  sufficient.  However,  when 
eigenvalues  are  close,  current  implementations  can  take  up  to  10n3  operations  due  to  the 
orthogonalization. 

2.4  Divide  and  Conquer  Methods 

In  1981,  Cuppen  proposed  a  solution  to  the  symmetric  tridiagonal  eigenproblem 
that  was  meant  to  be  efficient  for  parallel  computation  [25,  46].  It  is  quite  remarkable  that 
this  method  can  also  be  faster  than  other  implementations  on  a  serial  computer! 

The  matrix  T  may  be  expressed  as  a  modification  of  a  direct  sum  of  two  smaller 
tridiagonal  matrices.  This  modification  may  be  a  rank-one  update  [15],  or  may  be  ob¬ 
tained  by  crossing  out  a  row  and  column  of  T  [72].  The  eigenproblem  of  T  can  then  be 
solved  in  terms  of  the  eigenproblems  of  the  smaller  tri diagonal  matrices,  and  this  may  be 
done  recursively.  For  several  years  after  its  inception,  it  was  not  known  how  to  guaran¬ 
tee  numerically  orthogonality  of  the  eigenvector  approximations  obtained  by  this  approach. 
However  in  1992,  Gu  and  Eisenstat  found  a  clever  solution  to  this  problem,  and  paved  the 
way  for  robust  software  based  on  their  algorithms  [72,  73,  124]  and  Li’s  work  on  a  faster 
zero-finder  [102]. 

The  main  reason  for  the  unexpected  success  of  divide  and  conquer  methods  on 
serial  machines  is  deflation ,  which  occurs  when  an  eigenpair  of  a  submatrix  of  T  is  an  ac¬ 
ceptable  eigenpair  of  a  larger  matrix.  For  symmetric  tridiagonal  matrices,  this  phenomenon 
is  quite  common.  The  greater  the  amount  of  deflation,  the  lesser  is  the  work  required  in 
these  methods.  The  amount  of  deflation  depends  on  the  distribution  of  eigenvalues  and 
on  the  structure  of  the  eigenvectors.  In  the  worst  case  when  no  deflation  occurs  0(n3) 
operations  are  needed,  but  on  matrices  where  eigenvalues  cluster  and  the  eigenvector  ma- 


13 


trix  contains  many  tiny  entries,  substantial  deflation  occurs  and  many  fewer  than  0{n3) 
operations  are  required  [73]. 

In  [71,  73],  Gu  and  Eisenstat  show  that  by  using  the  fast  multipole  method  of 
Carrier,  Greengard  and  Rokhlin  [70,  16],  the  complexity  of  solving  the  symmetric  tridiagonal 
eigenproblem  can  be  considerably  lowered.  All  the  eigenvalues  and  eigenvectors  can  be  found 
in  0(n2)  operations  while  all  the  eigenvalues  can  be  computed  in  O(nlog2n )  operations. 
In  fact,  the  latter  method  also  finds  the  eigenvectors  but  in  an  implicit  factored  form 
(without  assembling  the  n2  entries  of  all  the  eigenvectors)  that  allows  multiplication  of  the 
eigenvector  matrix  by  a  vector  in  about  n2  operations.  However,  the  constant  factor  in 
the  above  operation  counts  is  quite  high,  and  matrices  encountered  currently  are  not  large 
enough  for  these  methods  to  be  viable.  There  is  no  software  available  as  yet  that  uses  the 
fast  multipole  method  for  the  eigenproblem. 

2.5  Other  Methods 

The  oldest  method  for  solving  the  symmetric  eigenproblem  is  one  due  to  Jacobi 
that  dates  back  to  1846  [86],  and  was  rediscovered  by  von  Neumann  and  colleagues  in 
1946  [7].  Jacobi’s  method  does  not  reduce  the  dense  symmetric  matrix  A  to  tridiagonal 
form,  as  most  other  methods  do,  but  instead  works  on  A.  It  performs  a  sequence  of  plane 
rotations  each  of  which  annihilates  an  off-diagonal  element  (which  is  filled  in  during  later 
steps).  There  are  a  variety  of  Jacobi  methods  that  differ  solely  in  their  strategies  for 
choosing  the  next  element  to  be  annihilated.  All  good  strategies  tend  to  diminish  the  off- 
diagonal  elements,  and  the  resulting  sequence  of  matrices  converges  to  the  diagonal  matrix 
of  eigenvalues.  Jacobi  methods  cost  0(n3)  or  more  operations  but  the  constant  is  larger 
than  in  any  of  the  algorithms  discussed  above.  Despite  their  slowness,  these  methods  are 
still  valuable  as  they  seem  to  be  more  accurate  than  other  methods  [37].  They  can  also  be 
quite  fast  on  strongly  diagonally  dominant  matrices. 

The  bisection  algorithm  discussed  in  Section  2.3  is  a  reliable  way  to  compute 
eigenvalues.  However,  it  can  be  quite  slow  and  there  have  been  many  attempts  to  find  faster 
zero-finders  such  as  the  Rayleigh  Quotient  Iteration  [110],  Laguerre’s  method  [90,  113]  and 
the  Zeroin  scheme  [31,  13].  These  zero-finders  can  considerably  speed  up  the  computation 
of  isolated  eigenvalues  but  they  seem  to  stumble  when  eigenvalues  cluster. 

Homotopy  methods  for  the  symmetric  eigenproblem  were  suggested  by  Chu  in  [23, 
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24].  These  methods  start  from  an  eigenvalue  of  a  simpler  matrix  D  and  follow  a  smooth 
curve  to  find  an  eigenvalue  of  A(t)  =  D  +  t(A  t}D).  D  was  chosen  to  be  the  diagonal 
of  the  tridiagonal  in  [103],  but  greater  success  was  obtained  by  taking  D  to  be  a  direct 
sum  of  submatrices  of  T  [105,  99].  An  alternate  divide  and  conquer  method  that  finds  the 
eigenvalues  by  using  Laguerre’s  iteration  instead  of  homotopy  methods  is  given  in  [104]. 
The  corresponding  eigenvectors  in  these  methods  are  obtained  by  inverse  iteration. 

2.6  Comparison  of  Existing  Methods 

All  currently  implemented  software  for  finding  all  the  eigenvalues  and  eigenvectors 
of  a  symmetric  tridiagonal  matrix  requires  0(n3)  work  in  the  worst  case.  The  fastest  current 
implementation  is  the  divide  and  conquer  method  of  [73].  As  mentioned  in  Section  2.4, 
many  fewer  than  0{n3)  operations  are  needed  when  heavy  deflation  occurs.  In  fact,  for 
some  matrices,  such  as  a  small  perturbation  of  the  identity  matrix,  just  0(n )  operations 
are  sufficient  to  solve  the  eigenproblem.  This  method  was  designed  to  work  well  on  parallel 
computers,  offering  both  task  and  data  parallelism  [46].  Efficient  parallel  implementations 
are  not  straightforward  to  program,  and  the  decision  to  switch  from  task  to  data  parallelism 
depends  on  the  characteristics  of  the  underlying  machine  [17].  Due  to  such  complications,  all 
the  currently  available  parallel  software  libraries,  such  as  ScaLAPACK  [22]  and  PeIGS  [52], 
use  algorithms  based  on  bisection  and  inverse  iteration.  A  drawback  of  the  current  divide 
and  conquer  software  in  LAPACK  is  that  it  needs  extra  workspace  of  more  than  2 n2  floating 
point  numbers,  which  can  be  prohibitively  excessive  for  large  problems.  Also,  the  divide 
and  conquer  algorithm  does  not  allow  the  computation  of  a  subset  of  the  eigenvalues  and 
eigenvectors  at  a  proportionately  reduced  cost. 

The  bisection  algorithm  enables  any  subset  of  k  eigenvalues  to  be  computed  with 
0(nk )  operations.  Each  eigenvalue  can  be  found  independently  and  this  makes  it  suitable  for 
parallel  computation.  However,  bisection  is  slow  if  all  the  eigenvalues  are  needed.  A  faster 
root-finder,  such  as  Zeroin  [31,  13],  speeds  up  computation  when  an  eigenvalue  is  isolated 
in  an  interval.  Multisection  maintains  the  simplicity  of  bisection  and  in  certain  situations, 
can  speed  up  the  performance  on  a  parallel  machine  [106,  11,  127].  When  the  k  eigenvalues 
are  well  separated,  inverse  iteration  can  find  the  eigenvectors  independently,  each  in  0(n ) 
time.  However,  to  find  eigenvectors  of  k  close  eigenvalues,  all  existing  implementations 
resort  to  reorthogonalization  and  this  costs  0(nk 2)  operations.  Orthogonalization  can  also 
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lead  to  heavy  communication  in  a  parallel  implementation.  The  computed  eigenvectors 
may  also  not  be  accurate  enough  in  some  rare  cases.  See  Section  2.8.1  for  more  details. 
Despite  these  drawbacks,  the  embarrassingly  parallel  nature  of  bisection  followed  by  inverse 
iteration  makes  it  easy  and  efficient  to  program  on  a  parallel  computer.  As  a  result,  this  is 
the  method  that  has  been  implemented  in  current  software  libraries  for  distributed  memory 
computers,  such  as  ScaLAPACK  [47],  PeIGS  [52,  54]  and  in  [75]. 

Like  the  recent  divide  and  conquer  methods,  the  QR  algorithm  guarantees  numer¬ 
ical  orthogonality  of  the  computed  eigenvectors.  The  0(n2)  computation  performed  in  the 
QR  method  to  find  all  the  eigenvalues  is  sequential  in  nature  and  is  not  easily  parallelized 
on  modern  parallel  machines  despite  the  attempts  in  [96,  132,  93].  However,  the  0(n3)  com¬ 
putation  in  accumulating  the  Givens’  rotations  into  the  eigenvector  matrix  is  trivially  and 
efficiently  parallelized,  see  [3]  for  more  details.  But  the  higher  operation  count  and  inability 
to  exploit  fast  matrix-multiply  based  operations  make  the  QR  algorithm  much  slower  than 
divide  and  conquer  and  also,  slower  on  average  than  bisection  followed  by  inverse  iteration. 

Prior  to  beginning  this  work,  we  were  hopeful  of  finding  an  algorithm  that  could 
fulfil  the  rather  ambitious  goals  of  Section  1.1  because 

•  when  eigenvalues  are  well  separated,  bisection  followed  by  inverse  iteration  requires 
0{n2)  operations,  and 

•  when  eigenvalues  are  clustered,  the  divide  and  conquer  method  is  very  fast. 

The  above  observations  suggest  a  hybrid  algorithm  that  solves  clusters  by  the 
divide  and  conquer  algorithm  and  computes  eigenvectors  of  isolated  eigenvalues  by  inverse 
iteration.  Indeed  such  an  approach  has  been  taken  in  [60].  Another  alternative  is  to  perform 
bisection  and  inverse  iteration  in  higher  precision.  This  may  be  achieved  by  simulating 
quadrupled  precision,  i.e. ,  doubling  the  precision  of  the  machine’s  native  arithmetic,  in 
software  in  an  attempt  to  obviate  the  need  for  orthogonalization  [38]. 

We  choose  to  take  a  different  approach  in  our  thesis.  By  revisiting  the  problem 
at  a  more  fundamental  level,  we  have  been  able  to  arrive  at  an  algorithm  that  shares 
the  attractive  features  of  inverse  iteration  and  the  divide  and  conquer  method.  Since  our 
approach  can  be  viewed  as  an  alternate  way  of  doing  inverse  iteration,  we  now  look  in  detail 
at  the  issues  involved  in  any  implementation  of  inverse  iteration.  Although  many  surprising 
aspects  of  current  implementations  are  revealed  by  our  careful  examination,  the  reader  who 
is  pressed  for  time  may  skip  on  to  Chapter  3  for  the  main  results  of  this  thesis.  The  material 
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in  the  upcoming  sections  also  appears  in  [40]. 

2.7  Issues  in  Inverse  Iteration 

Inverse  Iteration  is  a  method  to  find  an  eigenvector  when  an  approximate  eigen¬ 
value  A  is  known  : 

„(0)  =  i,  (4oA/>(i+1)  =  rWiiW,  *  =  0,1,2,....  (2.7.4) 

Here  b  is  the  starting  vector.  Usually  ||t>W||  =  1  and  7"W  is  a  scalar  chosen  to  try 
and  make  ||tA*+1)||  ~  1.  We  now  list  the  key  issues  that  arise  in  a  computer  implementation. 

I.  Choice  of  shift.  Given  an  approximate  eigenvalue  A,  what  shift  a  should  be  chosen 
when  doing  the  inverse  iteration  step  : 

(A  ?  (2.7.5) 

Should  a  always  equal  A?  How  close  should  a  be  to  an  eigenvalue?  Should  the 
accuracy  of  A  be  checked? 

II.  Direction  of  starting  vector.  How  should  b  be  chosen? 

III.  Scaling  of  right  hand  side.  When  the  shift  a  is  very  close  to  an  eigenvalue, 
||(A  0(t/)_1||  is  large  and  solving  (2.7.5)  may  result  in  overflow.  Can  7"W  be  chosen 
to  prevent  overflow? 

IV.  Convergence  Criterion.  When  does  an  iterate  satisfy  (1.1.1)?  If  the  criterion 
for  acceptance  is  too  strict,  the  iteration  may  never  stop  and  the  danger  of  too  loose 
a  criterion  is  that  poorer  approximations  than  necessary  may  be  accepted. 

V.  Orthogonality.  Will  the  vectors  for  different  eigenvalues  computed  by  (2.7.4)  be 
numerically  orthogonal?  If  not,  what  steps  must  be  taken  to  ensure  orthogonality? 

We  now  examine  these  issues  in  more  detail.  Before  we  do  so,  it  is  instructive  to 
look  at  the  first  iterate  of  (2.7.5)  in  the  eigenvector  basis.  Suppose  that  b  is  a  starting  vector 
with  ||&||2  =  1,  and  that  a  is  an  approximation  to  the  eigenvalue  Ai.  Writing  b  in  terms  of 
the  eigenvectors,  b  =  Y^i= 1  6'U,  we  get ,  in  exact  arithmetic 

vi  =  t^HA  Oct/)_16  =  : — U'l 

VAi  0(7  J 
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6'r(1)  (  ,  6  Ai  0(7 

01  =  + 


(2.7.6) 


I.  Choice  of  shift.  The  above  equation  shows  that  for  hi  to  be  a  good  approximation 
to  t>i,  a  must  be  close  to  Ai.  But  in  such  a  case,  the  linear  system  (2.7.5)  is  ill- 
conditioned  and  small  changes  in  a  or  A  can  lead  to  large  changes  in  the  solution 
Initially,  it  was  feared  that  roundoff  error  would  destroy  these  calculations 
in  finite  precision  arithmetic.  However  Wilkinson  showed  that  the  errors  made  in 
computing  v^t+1\  although  large,  are  almost  entirely  in  the  direction  of  tq  when  Ai 
is  isolated.  Since  we  are  interested  only  in  computing  the  direction  of  tq  these  errors 
pose  no  danger,  see  [119].  Thus  to  compute  the  eigenvector  of  an  isolated  eigenvalue, 
the  more  accurate  the  shift  is  the  better  is  the  approximate  eigenvector. 


It  is  common  practice  now  to  compute  eigenvalues  first,  and  then  invoke  inverse  it¬ 
eration  with  very  accurate  a.  Due  to  the  fundamental  limitations  of  finite  precision 
arithmetic,  eigenvalues  of  symmetric  matrixes  can,  in  general,  only  be  computed  to 
a  guaranteed  accuracy  of  0(e||H||)  [110].  Even  when  a  very  accurate  eigenvalue  ap¬ 
proximation  is  available,  the  following  may  influence  the  choice  of  the  shift  when  more 
than  one  eigenvector  is  desired. 


•  The  pairing  problem.  In  [20],  Chandrasekaran  gives  a  surprising  example 
showing  how  inverse  iteration  can  fail  to  give  small  residuals  in  exact  arithmetic 
if  the  eigenvalues  and  eigenvectors  are  not  paired  up  properly.  We  reproduce 
the  example  in  Section  2.8.  To  prevent  such  an  occurrence,  Chandrasekaran 
proposes  perturbing  the  eigenvalue  approximations  so  that  each  shift  used  for 
inverse  iteration  lies  to  the  left  of,  i.e. ,  is  smaller  than,  its  nearest  eigenvalue  (see 
Example  2.8.1  for  more  details). 

•  The  separation  problem.  The  solution  in  (2.7.5)  is  very  sensitive  to 

small  changes  in  a  when  there  is  more  than  one  eigenvalue  near  a.  In  [136, 
p.329],  Wilkinson  notes  that 

‘The  extreme  sensitivity  of  the  computed  eigenvector  to  very  small 
changes  in  A  \a  in  our  notation]  may  be  turned  to  practical  advantage 
and  used  to  obtain  independent  eigenvectors  corresponding  to  coincident 
or  pathologically  close  eigenvalues’. 

Wilkinson  proposed  that  such  nearby  eigenvalues  be  ‘artificially  separated’  by  a 
tiny  amount. 
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II.  Direction  of  starting  vector.  From  (2.7.6),  assuming  that  |Ai  Ooj  <C  |A;  Ooj  for 
i  zfz  1,  71  is  a  good  approximation  to  tq  provided  that  is  not  “negligible”,  i.e. ,  the 
starting  vector  must  have  a  non-negligible  component  in  the  direction  of  the  desired 
eigenvector.  In  [136,  pp. 315-321],  Wilkinson  investigates  and  rejects  the  choice  of  e\ 
or  en  as  a  starting  vector  (where  e4-  is  the  ith  column  of  the  nx  n  identity  matrix),  e k 
is  a  desirable  choice  for  a  starting  vector  if  the  &th  component  of  tq  is  above  average 
(>  1/^/n).  In  the  absence  of  an  efficient  procedure  to  find  such  a  k,  Wilkinson 
proposed  choosing  PLe  as  the  starting  vector,  where  T  O crl  =  PLU  and  e  is  the 
vector  of  all  l’s  [134,  136].  A  random  starting  vector  is  a  popular  choice  since  the 
probability  that  it  has  a  negligible  component  in  the  desired  direction  is  extremely 
low,  see  [87]  for  a  detailed  study. 

III.  Scaling  of  right  hand  side.  Equation  (2.7.6)  implies  that  ||iq||  =  0(|t(1)/(Ai-v><7)|) 
where  r^1)  is  the  scale  factor  in  the  first  iteration  of  (2.7.5).  If  a  is  very  close  to  an 
eigenvalue,  ||iq  ||  can  be  very  large  and  overflow  may  occur  and  lead  to  breakdown  in  the 
eigenvector  computation.  To  avoid  such  overflow,  r^1)  should  be  chosen  appropriately 
to  scale  down  the  right  hand  side.  This  approach  is  taken  in  EISPACK  and  LAPACK. 

IV.  Convergence  Criterion.  In  the  iteration  (2.7.4),  when  is  an  acceptable  eigen¬ 

vector?  The  residual  norm  is 

||(AoA1i>(8+1)||  _  lk(0  '  ^(0II  (e)  7  - 

||n(*+1)||  “  ||n(*+1)||  '  1  J 

The  factor  ||w(*+1)||/||'rW  •  t>W||  is  called  the  norm  growth.  To  guarantee  (1.1.1), 
is  usually  accepted  when  the  norm  growth  is  0(l/ne||T||),  see  [136,  p.324]  for  details. 
For  the  basic  iteration  of  (2.7.4)  this  convergence  criterion  can  always  be  met  in 
a  few  iterations,  provided  the  starting  vector  is  not  pathologically  deficient  in  the 
desired  eigenvector  and  |Ai  oAi|  =  0(ne||T||).  As  we  have  mentioned  before,  these 
requirements  are  easily  met. 

Since  the  eigenvalue  approximations  are  generally  input  to  inverse  iteration,  what 
should  the  software  do  if  the  input  approximations  are  not  accurate,  i.e.,  bad  data 
is  input  to  inverse  iteration?  We  believe  that  the  software  should  raise  some  sort 
of  error  flag  either  by  testing  for  the  accuracy  of  the  input  eigenvalues,  or  through 
non- convergence  of  the  iterates. 
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When  Ai  is  isolated,  a  small  residual  implies  orthogonality  of  the  computed  vector  to 
other  eigenvectors  (see  (2.7.8)  below).  However  when  Ai  is  in  a  cluster,  goal  (1.1.2) 
is  not  automatic.  As  we  now  discuss,  the  methods  used  to  compute  numerically 
orthogonal  vectors  can  impact  the  choice  of  the  convergence  criterion. 


V.  Orthogonality.  Standard  perturbation  theory  [110,  Section  11-7]  says  that  if  v  is  a 
unit  vector,  A  is  the  eigenvalue  closest  to  A  and  v  is  A’s  eigenvector  then 


sinZ(u,b)|  < 


|| Av  oAh|| 
gap(A) 


(2.7.8) 


where  gap(A)  =  minA^A  |A  oA8|. 

In  particular,  the  above  imphes  that  the  simple  iteration  scheme  of  (2.7.4)  cannot 
guarantee  orthogonality  of  the  computed  “eigenvectors”  when  eigenvalues  are  close. 
To  achieve  numerical  orthogonality,  current  implementations  modify  (2.7.4)  by  explic¬ 
itly  orthogonahzing  each  iterate  against  eigenvectors  of  nearby  eigenvalues  that  have 
already  been  computed. 

However,  orthogonalization  can  fail  if  the  vectors  to  be  orthogonalized  are  close  to 
being  parallel.  When  this  happens,  two  surprising  difficulties  arise  : 


•  The  orthogonalized  vectors  may  not  provide  an  orthogonal  basis  of  the  desired 
subspace. 

•  Orthogonalization  may  lead  to  cancellation  and  a  decrease  in  norm  of  the  iterate. 
Thus  a  simple  convergence  criterion  (as  suggested  above  in  issue  IV)  may  not  be 
reached. 


The  eigenvalues  found  by  the  QR  algorithm  and  the  divide  and  conquer  method  can 
be  in  error  by  0(e||T||).  As  a  result,  approximations  to  small  eigenvalues  may  not 
be  correct  in  most  of  their  digits.  Thus  computed  eigenvalues  may  not  be  accurate 
“enough”  leading  to  the  above  failures  surprisingly  often.  We  give  examples  of  such 
occurrences  in  Section  2.8.1.  In  response  to  the  above  problems,  Chandrasekaran 
proposes  a  new  version  of  inverse  iteration  that  is  considerably  different  from  the 
EISPACK  and  LAPACK  implementations  in  [20].  The  differences  include  an  alternate 
convergence  criterion.  The  drawback  of  this  new  version  is  the  potential  increase  in 
the  amount  of  computation  required. 
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Inverse  Iteration^, A) 

/*  assume  that  hi,  f>2,  •  •  • ,  h?_i  have  been  computed,  and  A;,  A4-+i,  . . . ,  A j  form  a  cluster  */ 
Choose  a  starting  vector  bj; 

Orthogonalize  bj  against  A,  b8'+i, . . . ,  Vj- 1; 

/  =  0;  ; 

do 

/  =  /  +  1; 

Solve  (A  oAj/)t>(;+1)  = 

Orthogonalize  against  A,  b8'+i, . . . ,  bj_ i; 

while(||n('+1)||/||r(')i;(0||  is  not  “big”  enough) 


Figure  2.1:  A  typical  implementation  of  Inverse  Iteration  to  compute  the  jth  eigenvector 

2.8  Existing  Implementations 

Figure  2.1  gives  the  pseudocode  for  a  typical  implementation  of  inverse  iteration  to 
compute  Vj,  the  jth  eigenvector,  assuming  that  v\,  V2,  ■  ■  ■ ,  i  have  already  been  computed. 
Note  that  in  this  pseudocode,  both  the  starting  vectors  and  iterates  are  orthogonalized 
against  previously  computed  eigenvectors.  Surprisingly,  as  the  fohowing  example  shows,  this 
implementation  can  fail  to  give  small  residual  norms  even  in  exact  arithmetic  by  incorrectly 
pairing  up  the  eigenvalues  and  eigenvectors. 

Example  2.8.1  [The  Pairing  Error.]  (Chandrasekaran  [20])  Let  Ai  be  an  arbitrary  real 
number,  and 

A2  —  Ai  £ ,  A?'_|_i  A?'  —  A?'  Ai ,  Xn  Al,  %  —  2,...,n  1 

where  e  is  of  the  order  of  the  machine  precision.  Explicitly,  \  =  Ai  +  2*_2e.  Suppose  that 
A i  >  A i,  i  =  1, . .  ,,n  and  most  importantly  Ai  0A1  >  A2  0A1. 

Figure  2.2  illustrates  the  situation. 
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Figure  2.2:  Eigenvalue  distribution  in  Example 


Assume  that  in  Figure  2.1  each  bj  is  ortliogonalized  against  hi,  1>2, . . . ,  hi_1.  If 
1,1  r  -  i  7^  0,  then  in  exact  arithmetic  the  computed  eigenvectors  are 

Vi  =  Vi+1,  !  =  1,...,)!«1 

arid,  because  vn  must  be  orthogonal  to  ifi,  v-2, . . . ,  vn-i, 

Vn  =  Ih. 

Since  the  eigenvalues  grow  exponentially,  the  residual  norm  ||(  A  0-\nI)vn\\  is  large!  This 
is  because  even  though  the  eigenvectors  have  been  computed  correctly,  each  is  associated 
with  the  wrong  eigenvalue.  □ 


Hence,  a  simple  inverse  iteration  code  based  on  orthogonalization  may  appear 
to  fail  even  in  exact  arithmetic.  To  cure  this  problem,  Cliandrasekaran  proposes  that 
A i  oO(??f||A||)  be  used  as  the  shifts  for  inverse  iteration  so  that  all  shifts  are  guaranteed 
to  lie  to  the  left  of  the  actual  eigenvalues  [20].  Neither  EISPACIv  nor  LAPACIv  do  this 
‘artificial’  perturbation. 

The  discerning  reader  will  realize  that  the  above  problem  is  not  the  failure  of 
the  basic  inverse  iteration  process.  Iterates  do  converge  to  the  closest  eigenvector  that 
is  orthogonal  to  the  eigenvectors  computed  earlier.  The  error  is  elusive  but  once  seen,  it 
may  be  argued  that  the  implementation  in  Figure  2.1  is  sloppy.  An  easy  cure  would  be 
to  associate  with  each  computed  eigenvector  its  Rayleigh  Quotient,  which  is  available  at  a 
modest  cost.  Unfortunately,  because  of  the  premium  on  speed,  most  current  software  does 
not  check  if  its  output  is  correct.  Thus,  errors  can  go  undetected  since  the  task  of  proving 
correctness  of  numerical  software  is  often  compromised  by  testing  it  on  a  finite  sample  of  a 
multi-dimensional  infinite  space  of  inputs. 

We  now  look  in  detail  at  two  existing  implementations  of  inverse  iteration  and 
see  how  they  address  the  issues  discussed  in  the  previous  section.  EISPACIv  [128]  and 
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LAPACK  [1]  are  linear  algebra  software  libraries  that  contain  routines  to  solve  various 
eigenvalue  problems.  EISPACK’s  implementation  of  inverse  iteration  is  named  Tinvit 
while  LAPACK’s  inverse  iteration  subroutine  is  called  xStein1  (Stein  is  an  acronym  for 
Symmetric  Tridiagonal’s  Eigenvectors  through  Inverse  Iteration).  xStein  was  developed  to 
be  more  accurate  than  Tinvit  as  the  latter  was  found  to  deliver  less  than  satisfactory  results 
in  several  test  cases.  In  order  to  achieve  accuracy  comparable  to  that  of  the  divide  and 
conquer  and  QR/QL  methods,  the  search  for  a  better  implementation  of  inverse  iteration 
led  to  xStein  [87].  However,  as  we  will  see  in  Section  2.8.1,  xStein  also  suffers  from  some 
of  the  same  problems  as  Tinvit  in  addition  to  introducing  a  new  serious  error. 

Both  EISPACK  and  LAPACK  solve  the  dense  symmetric  eigenproblem  by  re¬ 
ducing  the  dense  matrix  to  tri diagonal  form  by  Householder  transformations  [81],  and  then 
Ending  the  eigenvalues  and  eigenvectors  of  the  tridiagonal  matrix.  Both  Tinvit  and  xStein 
operate  on  a  symmetric  tridiagonal  matrix.  In  the  following,  we  will  further  assume  that 
the  tridiagonal  is  unreduced,  i.e. ,  all  the  off-diagonal  elements  are  nonzero. 

2.8.1  EISPACK  and  LAPACK  Inverse  Iteration 

Figure  2.3  gives  the  pseudocode  for  Tinvit  [128,  118]  while  Figure  2.4  outlines  the 
pseudocode  for  xStein  as  it  appears  in  LAPACK  release  2.0.  The  latter  code  has  changed 
little  since  it  was  first  released  in  1992.  It  is  not  necessary  for  the  reader  to  absorb  all 
details  of  the  implementations  given  in  Figures  2.3  and  2.4  to  follow  the  ensuing  discussion. 
We  provide  the  pseudocodes  as  references  in  case  the  reader  needs  to  look  in  detail  at  a 
particular  aspect  of  the  implementations. 

In  each  iteration,  Tinvit  and  xStein  solve  the  scaled  linear  system  ( T oA I)y  =  rb 
by  Gaussian  Elimination  with  partial  pivoting.  If  eigenvalues  agree  in  more  than  three 
digits  relative  to  the  norm,  the  iterates  are  orthogonalized  against  previously  computed 
eigenvectors  by  the  modified  Gram-Schmidt  method.  Note  that  in  both  these  routines  the 
starting  vector  is  not  made  orthogonal  to  previously  computed  eigenvectors,  as  is  done  in 
Figure  2.1.  Both  Tinvit  and  xStein  flag  an  error  if  the  convergence  criterion  is  not  satisfied 
within  five  iterations.  To  achieve  greater  accuracy,  xStein  does  two  extra  iterations  after 
the  stopping  criterion  is  satisfied.  We  now  compare  and  contrast  how  these  implementations 
handle  the  various  issues  discussed  in  Section  2.7. 

1The  prefix  ‘x’  stands  for  the  data  type:  real  single(S)  or  real  double(D),  or  complex  single(C)  or  complex 
double(Z) 


Tinvit(T,A) 

/*  Tinvit  computes  all  the  eigenvectors  of  T  given  the  computed  eigenvalues  Ai, . . Xn  */ 
for  j  =  1 ,  n 

aj  = 

if  j  >  1  and  A  j  <  (Jj-i  then 

i  +  ellrIU;  /*  Perturb  identical  eigenvalues  */ 

end  if 

Factor  T  =  PLU ;  /*  Gaussian  Elimination  with  partial  pivoting  */ 

if  U(n,n )  =  0  then  U(n,n )  =  £||T||; 
r  =  -y/n£||r||^;  /*  Compute  scale  factor  */ 

Solve  Uy  =  re;  /*  Solve.  Here,  e  is  the  vector  of  all  l’s  */ 
for  all  k  <  j  such  that  \aj  |  <  10-3||T||# 

y  =  y  /*  Apply  Modified  Gram-Schmidt  */ 

end  for 

b  =  y\  iter  =  1; 

whi le( 1 1 2/ 1 1 1  <  1  and  iter  <  5)  do 

t  =  n£||r||^/||6||i;  /*  Compute  scale  factor  */ 

Solve  PLUy  =  r6;  /*  Solve  with  scaled  right  hand  side  */ 
for  all  k  <  j  such  that  \aj  |  <  10-3||T||#  do 

y  =  /*  Apply  Modified  Gram-Schmidt  */ 

end  for 

b  =  y\  iter  =  iter  +  1; 
end  while 
if  ||r/||i  <  1  then 

Vj  =  0;  ierr  =  o/;  /*  set  error  flag  */ 
print  “jth  eigenvector  failed  to  converge”; 

else 

^  =  y/\\y\W, 

end  if 
end  for 


Figure  2.3:  Tinvit  —  EISPACK’s  implementation  of  Inverse  Iteration 
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xStein(T,A) 

/*  xStein  computes  all  the  eigenvectors  of  T  given  the  eigenvalues  A  */ 
for  j  =  1 ,  n 
° j  = 

if  j  >  1  and  Aj  <  10e|Aj|  then 

a  j  =  (Tj_i  +  10e|Aj|;  /*  Perturb  nearby  eigenvalues  */ 

end  if 

Factor  T  =  PLU\  /*  Gaussian  Elimination  with  partial  pivoting  */ 

Initialize  vector  b  with  random  vector; 
iter  =  0;  extra  =  0;  converged  =  false; 
do 


t  =  n||T||i  max(e,  | U(n,  ra)|)/||6||i;  /*  Compute  scale  factor  */ 
Solve  PLUy  =  r6;  /*  Solve  with  scaled  right  hand  side  */ 
for  all  k  <  j  such  that  \aj  0<7fc|  <  10-3||T||i  do 

y  =  y  /*  Apply  Modified  Gram-Schmidt  */ 

end  for 

b  =  y\  iter  =  iter  +  1; 

if  converged  ==  true  then  extra  =  extra  +  1;  end  if 
if  1 1 2/ 1 1 oo  >  \j then  converged  =  true;  end  if 
\Nb\\e((converged  ==  false  or  extra  <  2)  and  iter  <  5) 

^  =  y/\\y\W, 

if  iter  ==  5  and  extra  <  2  then 

print  “jth  eigenvector  failed  to  converge”; 

end  if 
end  for 


Figure  2.4:  STEIN  —  LAPACK’s  implementation  of  Inverse  Iteration 
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I.  Direction  of  starting  vector.  Tinvit  chooses  the  starting  vector  to  be  PLe  where 
T  O oI  =  PLU ,  a  being  an  approximation  to  the  eigenvalue,  and  e  is  the  vector  of  all 
l’s.  Note  that  this  choice  of  starting  vector  reduces  the  first  iteration  to  simply  solving 
Uy  =  re.  On  the  other  hand,  xStein  chooses  a  random  starting  vector,  each  of  whose 
elements  comes  from  a  uniform  (ol,  1)  distribution.  Neither  choice  of  starting  vectors 
is  likely  to  be  pathologically  deficient  in  the  desired  eigenvector.  The  random  starting 
vectors  are  designed  to  be  superior  to  Tinvit’s  choice  [87]. 

II.  Choice  of  shift.  Even  though  in  exact  arithmetic  all  the  eigenvalues  of  an  unreduced 
tri diagonal  matrix  are  distinct,  some  of  the  computed  eigenvalues  may  be  identical 
to  working  accuracy.  In  [136,  p.329],  Wilkinson  recommends  that  pathologically  close 
eigenvalues  be  perturbed  by  a  small  amount  in  order  to  get  an  orthogonal  basis  of 
the  desired  subspace.  Following  this,  Tinvit  replaces  the  k  +  1  equal  approximations 

Aj  =  Aj+i  =  •  •  •  =  A j+k  by 

Aj  <  A j  +  e||T||#  <  •  •  •  <  A j  +  A;e||T||^, 
where  ||T||fi  =  max,  \TU\  +  \Tht+1\  <  ||T||i. 

We  now  give  an  example  where  this  perturbation  is  too  big.  As  a  result,  the  shifts 
used  to  compute  the  eigenvectors  are  quite  different  from  the  computed  eigenvalues 
and  prevent  the  convergence  criterion  from  being  attained. 

Example  2.8.2  [Excessive  Perturbation.]  Using  LAPACK’s  test  matrix  genera¬ 
tor  [36],  we  generated  a  200  X  200  tridiagonal  matrix  such  that 

Ai  ~  •  •  •  ~  Aioo  ~  Aioi  ~  •  •  •  ~  A199  ps  e,  Xn  =  1 

where  e  ~  1.2  X  10-7  (this  run  was  in  single  precision).  ||T||#  =  0(1),  and  the  shift 
used  by  Tinvit  to  compute  bigg  is 

a  =  ^e  +  198e||r||fl  «  2.3  X  10“5. 

Since  \a  oAigg|  can  be  as  large  as 

|<7  OAi99|  +  |Ai99  OAi99|  ps  4.6  X  10  5 

the  norm  growth  when  solving  (2.8.9)  is  not  big  enough  to  meet  the  convergence 
criterion  of  (2.8.11).  Tinvit  flags  this  as  an  error  and  returns  ierr  =  0199.  □ 
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Clearly,  perturbing  the  computed  eigenvalues  relative  to  the  norm  can  substantially 
decrease  their  accuracy.  However,  not  perturbing  them  is  also  not  acceptable  in 
Tinvit  as  coincident  shifts  would  lead  to  identical  L  U  factors,  identical  starting 
vectors  and  hence  iterates  that  are  parallel  to  the  eigenvectors  computed  previously  ! 

In  xStein,  coincident  eigenvalues  are  also  perturbed.  However,  the  perturbations 
made  are  relative.  Equal  approximations  A j  =  Aj+i  =  •  •  •  =  A j+k  are  replaced  by 

<  ^j+i  +  <^j+i  <  •  •  •  <  Aj+fc  +  SXj+k, 

where  SXi  =  J2]ZjbXl  +  10e|A8|.  This  choice  does  not  perturb  the  small  eigenval¬ 
ues  drastically,  and  appears  to  be  better  than  Tinvit’s.  On  the  other  hand,  this 
perturbation  is  too  small  in  some  cases  to  serve  its  purpose  of  Ending  a  linearly  in¬ 
dependent  basis  of  the  desired  subspace  (see  Example  2.8.7  and  Wilkinson’s  quote 
given  on  page  17).  Thus  it  is  easier  to  say  “tweak  close  eigenvalues”  than  to  find  a 
satisfactory  formula  for  it. 

III.  Scaling  of  right  hand  side  and  convergence  criterion.  For  each  A,  the  system 

( T  oA I)y  =  rb  (2.8.9) 


is  solved  in  each  iteration.  With  Tinvit’s  choice  of  r,  the  residual  norm 


||(roA/)y||  r||6||  n£||r||fl 

Ill'll  Ill'll  Ill'll 


(2.8.10) 


where  ||T||#  =  max;  \Ta\  +  |T8'y+i|  <  ||T||i.  Tinvit  accepts  y  as  an  eigenvector  if 


Ill'll  >  1-  (2-8.11) 

By  (2.8.10),  the  criterion  (2.8.11)  ensures  that  goal  (1.1.1)  is  satisfied,  i.e. ,  ||(T  O 
XI)y\\/\\y\\  <  ne\\T\\R. 


Suppose  that  in  (2.8.9),  A  is  an  approximation  to  Ai.  Then  by  analysis  similar 
to  (2.7.6), 


Ibl 


O 


T\\b\\ 

I  Ai  oA| 


n4T\ \r 

I  Ai  oA| 


Since  ||r/||  is  expected  to  be  larger  than  1  at  some  iteration,  Tinvit  requires  that  A 
be  such  that  |AiOA|  =  0(ne||T||).  If  the  input  approximations  to  the  eigenvalues  are 
not  accurate  enough,  the  iterates  do  not  “converge”  and  Tinvit  flags  an  error. 
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When  A  is  very  close  to  an  eigenvalue,  ||r/||  can  be  huge.  Tinvit  tries  to  avoid  overflow 
by  replacing  a  zero  value  of  the  last  pivot  in  the  PLU  decomposition,  unn ,  by  e||T||. 
However  this  measure  cannot  avoid  failure  in  the  following  example. 


Example  2.8.3  [Perturbing  zero  values  is  not  enough.] 


T  = 


07  10  0 

10  0  10 
0  10  77(1  +  e) 


(2.8.12) 


Here  e  is  the  machine  precision  while  r]  is  the  underflow  threshold  of  the  machine 
(r]  ~  10-308  in  IEEE  double  precision  arithmetic).  T  is  nearly  singular  and 


A2  =  0  and  partial  pivoting  =y  unn  =  rje. 


Since  PLUy  =  rb  and  r  =  ra£||T||#/||&||, 

,  ,  s\\T\\  10  . 

yin)  ~  -  =  —  =y  overflow! 

Tj£  7] 

Note  that  to  exhibit  this  failure,  we  required  gradual  underflow  in  IEEE  arithmetic 
(the  value  r]£  should  not  underflow  to  zero).  However  gradual  underflow  is  not  neces¬ 
sary  to  exhibit  such  a  failure.  A  similar  error,  where  there  is  no  gradual  underflow, 
occurs  on  (l/e)T  where  T  is  as  in  (2.8.12).  □ 


In  xStein,  t  is  chosen  to  be 


T 


n||T||i  max(e, \unn\) 


(2.8.13) 


where  T  oA /  =  PLU  and  unn  is  the  last  diagonal  element  of  U  [85].  The  significant 
difference  between  this  scale  factor  and  the  one  in  Tinvit  is  the  term  max(e,  |unn|) 
instead  of  e.  xStein  accepts  the  iterate  y  as  a  computed  eigenvector  if 


IblU  > 


1 

10n 


(2.8.14) 


The  above  choice  of  scale  factor  in  xStein  introduces  a  serious  error  not  present  in 
Tinvit.  Suppose  A  approximates  Ai.  When  A  =  Ai,  it  can  be  proved  that  unn  must 
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be  zero  in  exact  arithmetic.  We  now  examine  the  values  unn  may  take  when  A  7^  Ai. 
Since  T  oA /  =  PLU , 


U^L-1  =  (T  oA/)_1P 
=>  eTnU-xL~xen  =  eTn(T^\I)-xPen 


Since  L  is  unit  lower  triangular,  L  xen  =  en.  Letting  Pen  =  and  T 
get 

—  =  e%V(A&\ I)~xVTek 

^ nn 

1  _  vnlvkl  v-^  vniVki 

unn  X\  OA  ■ _2  Aj  OA 


V AVT ,  we 


(2.8.15) 


where  vj~i  denotes  the  £;th  component  of  V{.  By  examining  the  above  equation,  we  see 
how  the  choice  of  scale  factor  in  xStein  opens  up  a  Pandora’s  box.  Equation  2.8.15 
says  that  for  unn  to  be  small,  |AoAi|  \vn\Vki\. 


Example  2.8.4  [A  code  may  fail  but  should  never  lie.]  Consider 

1  Ve  0 

T=  7e/4  e/4  (2.8.16) 

0  e/4  3e/4 

where  e  is  about  machine  precision  (e  ~  2.2  X  10-16  in  IEEE  double  precision  arith¬ 
metic).  T  has  eigenvalues  near  e/2,  e,  1  +e.  Suppose,  A  is  incorrectly  input  as  2.  Then 
by  (2.8.13)  and  (2.8.15), 

A  =  2  =y  \unn\  =  0(1)  =8  t  =  0(1) ! 

Clearly,  this  value  of  r  does  not  ensure  a  large  norm  growth  when  the  stopping  crite¬ 
rion  (2.8.14)  is  satisfied  in  solving  (2.8.9).  As  a  result,  any  arbitrary  vector  can  achieve 
the  “convergence”  criterion  of  (2.8.14)  and  be  output  as  an  approximate  eigenvector. 
In  a  numerical  run,  the  vector  [<tt0.6446  0.6373  0.4223]T  was  accepted  as  an  eigen¬ 
vector  by  xStein  even  though  it  is  nowhere  close  to  any  eigenvector  of  T!  □ 


This  example  represents  one  of  the  more  dangerous  errors  of  numerical  software  —  the 
software  performs  erroneous  computation  but  does  not  flag  any  error  at  all.  Failure 
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to  handle  incorrect  input  data  can  have  disastrous  consequences  2.  On  the  above 
example,  Tinvit  correctly  flags  an  error  indicating  that  the  computation  did  not 
“converge”.  Of  course,  most  of  the  times  the  eigenvalues  input  to  xStein  will  be 
quite  accurate  and  the  above  phenomenon  will  not  occur. 

Even  if  A  is  a  very  good  approximation  to  Ai,  (2.8.15)  indicates  that  unn  may  not  be 
small  if  vni  is  tiny.  It  is  not  at  all  uncommon  for  a  component  of  an  eigenvector  of  a 
tridiagonal  matrix  to  be  tiny  [136,  pp. 317-321].  xStein ’s  choice  of  scale  factor  may 
lead  to  unnecessary  overflow  as  shown  below. 


Example  2.8.5  [Undeserved  overflow.]  Consider  the  matrix  given  in  (2.8.16). 

The  eigenvector  corresponding  to  the  eigenvalue  A3  =  1  +  e  +  0(s2)  is 


V3 


1  oe/2  +  0(e3) 

yfe  +  0(e3/2) 

e3/2/4  _|_  0(£5/2) 


If  A  =  1,  then  | ('^Jri3'^A;3 )/ ( A3  oA)|  <  y/e  and  by  (2.8.15),  \unn\  =  0(||T||).  In  such  a 
case,  (2.8.13)  implies  that  r  =  0(||T||2)  and  if  ||T||  >  1  the  right  hand  side  is  scaled 
up  rather  than  being  scaled  down!  As  a  consequence,  xStein  overflows  on  the  scaled 
matrix  y/Q,  T  where  A l  is  the  overflow  threshold  of  the  computer  (0  =  21023  ~  10308 
in  IEEE  double  precision  arithmetic).  □ 


Note  that  the  above  matrix  does  not  deserve  overflow.  A  similar  overflow  occurrence 
(in  IEEE  double  precision  arithmetic)  on  an  8  X  8  matrix,  with  a  largest  element  of 
magnitude  2484  ~  10145,  was  reported  to  us  by  Jeremy  DuCroz  [48]. 

The  problems  reported  above  can  be  cured  by  reverting  back  to  the  choice  of  scale 
factor  in  EISPACK’s  Tinvit. 

IV.  Orthogonality.  Tinvit  and  xStein  use  the  modified  Gram-Schmidt  (MGS)  proce¬ 
dure  to  orthogonalize  iterates  corresponding  to  eigenvalues  whose  separation  is  less 
than  10-3||T||.  In  order  for  the  orthogonalized  vectors  to  actually  be  numerically 
orthogonal,  the  vectors  must  not  be  parallel  prior  to  the  orthogonalization.  In  the 

2In  the  summer  of  1996,  a  core  dump  on  the  main  computer  aboard  the  Ariane  5  rocket  was  interpreted 
as  flight  data,  causing  a  violent  trajectory  correction  that  led  to  the  disintegration  of  the  rocket 
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following  example  the  vectors  to  be  orthogonalized  are  almost  parallel.  The  next  two 
examples  are  reproduced  in  Case  Study  A. 


Example  2.8.6  [Parallel  Iterates.]  Consider  the  matrix  of  (2.8.16).  T  has  the 

eigenvalues 

Ai  =  e/2  +  0(e2),  A2  =  e  +  0(e2),  A3  =  1  +  e  +  0(e2). 

The  eigenvalues  of  T  as  computed  by  MATLAB’s  eig3  function  are 

Ai  =  e,  A2  =  e,  A3  =  1  +  e. 


We  perturb  A2  to  e(l  +  e)  and  input  these  approximations  to  Tinvit  to  demonstrate 
this  failure  (equal  approximations  input  to  Tinvit  are  perturbed  by  approximately 
e||T||,  see  Example  2.8.2). 

The  first  eigenvector  is  computed  by  Tinvit  as 

yi  =  (T^X1r)~1b1. 


In  exact  arithmetic  (taking  b\  =  22i&vi)i 

6  (  ,  £1  A2  0A1 


y  1  = 


A2  0A1 
1 

0(^2 


v2 


£2  Ai  0A1 


£3  A 2  0A1 
n  +  - ^v3 


^2  A3  0A1 


v2  +  0(e)v  1  +  0(e2)v3 


provided  (£i/£2)  and  (£3/^2)  are  0(1).  Due  to  the  inevitable  roundoff  errors  in  finite 
precision,  the  best  we  can  hope  to  compute  is 


h  =  y  (v2  +  0(e)v  1  +  0(e)v3)  • 

This  vector  is  normalized  to  remove  the  l/0(e2)  factor  and  give  The  second 
eigenvector  is  computed  as 

y2  =  (T  oA 2I)~1b2. 


Since  A2  ~  Ai,  the  computed  value  of  y2  is  almost  parallel  to  hi  (assuming  that  b2  has 
a  non-negligible  component  in  the  direction  of  v2),  i.e. , 


h 


1 

0(s2) 


(v2  +  0(e)v\  +  0(e) v3) . 


3MATLAB’s  eig  function  computes  eigenvalues  by  the  QR  algorithm 
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Step 

1 

2 

3 

Before  MGS 

After  MGS 

Before  MGS 

After  MGS 

Before  MGS 

After  MGS 

Iterate 

(y) 

1.05  •  10-8 
<tt0.707 
<tt0.707 

1.04-  10“8 
<tt0.697 

0.716 

1.05  •  10-8 
<tt0.707 
<tt0.707 

1.05  •  10-8 
<tt0.7069 

0.7073 

1.05  •  10-8 
<tt0.707 
<tt0.707 

1.05  •  10-8 
<30. 707108 

0.707105 

Ir'Til 

1.000 

0.0014 

1.000 

3.0  •  10“4 

1.000 

1.9  •  10-6 

\yTh\ 

3.9  •  10-24 

4.1  •  IQ-11 

5.8  •  10-25 

2.9  •  10-12 

2.2  •  10-24 

1.1  •  10-13 

Table  2.1:  Summary  of  xStein’s  iterations  to  compute  the  second  eigenvector  of  T. 

Since  Ai  and  A2  are  nearly  coincident,  i/2  is  orthogonalized  against  v\  by  the  MGS 
process  in  an  attempt  to  reveal  the  second  eigenvector  : 

^  =  fa&iylh)*’!  (2.8.17) 

=  ol^iOie^  +  Oie^  +  Oie^) 
h  =  z/\\z\\- 

Clearly  z  is  not  orthogonal  to  v\.  But  Tinvit  accepts  z  as  having  “converged”  since 
||z||  is  big  enough  to  satisfy  the  convergence  criterion  (2.8.14)  even  after  the  severe 
cancellation  in  (2.8.17).  The  above  observations  are  confirmed  by  a  numerical  run  in 
double  precision  arithmetic  wherein  the  first  two  eigenvectors  of  the  matrix  in  (2.8.16) 
as  output  by  Tinvit  had  a  dot  product  of  0.0452!  □ 


Unlike  Tinvit,  xStein  computes  each  eigenvector  from  a  different  random  starting 
vector.  The  hope  is  to  get  greater  linear  independence  of  the  iterates  before  orthogo- 
nalization  by  the  MGS  method  [87].  However,  as  we  now  show,  the  Tinvit  error  as 
reported  above  persists. 

Example  2.8.7  [Linear  Dependence  Persists.]  Consider  again  the  matrix  T 
given  in  (2.8.16).  The  eigenvalues  input  to  xStein  are  computed  by  the  eig  func¬ 
tion  in  MATLAB  as  Ai  =  A2  =  e  and  A3  =  1  +  e.  As  in  Tinvit  the  first  eigenvector 
computed  as  hi  is  almost  parallel  to  t>2-  The  iterations  to  compute  the  second  eigen¬ 
vector  are  summarized  in  Table  2.1. 
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This  behavior  is  very  similar  to  that  of  Tinvit  (see  Example  2.8.6).  xStein  does  two 
more  iterations  than  Tinvit  and  alleviates  the  problem  slightly,  but  a  dot  product 
of  1.9  X  10-6  between  computed  eigenvectors  is  far  from  satisfactory  (this  run  was  in 
double  precision).  □ 

xStein  avoids  the  overflow  problems  of  Tinvit  shown  in  example  2.8.3.  It  checks 
to  see  if  overflow  would  occur,  and  if  so,  perturbs  tiny  entries  on  the  diagonal  of  U  [85].  This 
check  is  in  the  inner  loop  when  solving  Uy  =  x  where  x  =  tL~x P~1 II. III.b.  Coupled  with  the 
extra  iterations  done  after  convergence,  this  results  in  xStein  being  slower  than  Tinvit. 
On  an  assorted  collection  of  test  matrices  of  sizes  50  to  1000,  we  observed  xStein  to  be  2-3 
times  slower  than  Tinvit.  However  xStein  was  more  accurate  than  Tinvit  in  general. 

2.9  Our  Approach 

As  we  observed  in  the  previous  section,  the  computer  implementation  of  a  seem¬ 
ingly  straightforward  task  can  lead  to  surprising  failures.  We  want  to  avoid  such  failures 
and  indicate  our  approach  to  the  various  aspects  of  inverse  iteration  discussed  above. 

I.  Choice  of  shift.  Of  the  various  issues  discussed  in  Section  2.7,  the  choice  of  starting 
vector  and  convergence  criterion  have  been  extensively  studied  [136,  118,  119,  87]. 
Surprisingly,  the  choice  of  shift  for  inverse  iteration  seems  to  have  drawn  little  at¬ 
tention.  We  feel  that  the  shift  is  probably  the  most  important  variable  in  inverse 
iteration.  Examples  2.8.6  and  2.8.7  highhght  the  importance  of  shifts  that  are  as 
accurate  as  possible.  We  present  our  solution  in  Chapter  Jh 

II.  Direction  of  starting  vector  and  scaling  of  right  hand  side.  This  problem,  is 
solved.  It  is  now  possible  to  deterministically  find  a  starting  vector  that  is  guaranteed 
to  have  an  above  average  component  in  the  direction  of  the  desired  eigenvector  v. 
Knowing  the  position  of  a  large  component  of  v  also  enables  us  to  avoid  the  possibility 
of  overflow  in  the  eigenvector  computation.  See  Chapter  3  for  more  details. 

III.  Convergence  criterion  and  Orthogonality.  It  is  easy  to  find  a  criterion  that 
guarantees  small  residual  norms  (goal  (1.1.1)).  However,  as  we  saw  in  earlier  sections, 
the  goal  of  orthogonality  (1.1.2)  can  lead  to  a  myriad  of  problems.  Most  of  the  “diffi¬ 
cult”  errors  in  the  EISPACK  and  LAPACK  implementations  arise  due  to  the  explicit 
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orthogonalization  of  iterates  when  eigenvalues  are  close.  In  [20],  Chandrasekaran  ex¬ 
plains  the  theory  behind  some  of  these  failures,  and  proposes  an  alternate  version 
of  inverse  iteration  that  is  provably  correct.  However,  this  version  is  more  involved 
and  potentially  requires  much  more  orthogonalization  than  in  existing  implementa¬ 
tions.  We  want  to  take  a  different  approach  to  orthogonality,  and  this  is  discussed  in 
Chapters  4,  5  and  6. 
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Chapter  3 

Computing  the  eigenvector  of  an 
isolated  eigenvalue 


Given  an  eigenvalue  A  of  a  tridiagonal  matrix  «/,  a  natural  way  to  solve  for  r  /  0 
in 

(JoAl>  =  0  (3.0.1) 

is  to  set  u(l)  =  1  and  to  use  the  first  equation  of  (3.0.1)  to  determine  w(2),  and  the  second 
to  determine  w(3),  using  u(l)  and  v(2).  Proceeding  like  this,  the  rth  equation  may  be  used 
to  determine  v(r  +  1)  and  thus  v  may  be  obtained  without  actually  making  use  of  the  nth 
equation  which  will  be  satisfied  automatically  since  the  system  (3.0.1)  is  singular. 

It  would  be  equally  valid  to  begin  with  v(n)  =  1  and  to  take  the  equations  in 
reverse  order  to  compute  v(n  Ol), . .  .  ,b(2),b(l)  in  turn  without  using  the  first  equation 
in  (3.0.1).  When  normalized  in  the  same  way  v  and  v  will  yield  the  same  eigenvector  in 
exact  arithmetic. 

The  method  described  above  is  ‘obvious’  and  was  mentioned  by  W.  Givens  in 
1957,  see  [61].  It  often  gives  good  results  when  realized  on  a  computer  but,  at  other  times, 
delivers  vectors  pointing  in  completely  wrong  directions  for  the  following  reason.  It  is 
rare  that  an  eigenvalue  of  a  tridiagonal  (or  any  other)  matrix  is  representable  in  limited 
precision.  Consequently  the  systems  such  as  (3.0.1)  that  are  to  be  solved  are  not  singular 
and,  in  (3.0.1),  the  unused  equation  will  not  be  satisfied  automatically  even  if  the  solutions 
of  the  other  equations  were  obtained  exactly.  The  two  methods  given  above  result  in  solving 


(  J  oAI)rW  =  Snen 


(3.0.2) 
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and 

(  J  oA/)a;(1)  =  8\e\  (3.0.3) 

respectively,  where  A  is  an  approximation  to  A  and  Si ,  Sn  are  the  residuals  of  the  equations 
that  are  not  satisfied.  Note  that  (3.0.2)  and  (3.0.3)  show  that  the  natural  method  may  be 
thought  of  as  doing  one  step  of  inverse  iteration  as  given  in  (2.7.4)  with  the  starting  vector 
equal  to  e\  or  en.  We  now  present  an  example  illustrating  how  this  natural  method  can 
fail. 


Example  3.0.1  [Choice  of  e\  is  wrong.]  Consider 


T  = 


3e/4  e/4  0 

e/4  7e/4  yje 

0  \/s  1 


where  e  is  of  the  order  of  the  machine  precision.  An  eigenvalue  of  T  is 


A  —  1  +  e  +  0(e2), 


(3.0.4) 


and  its  associated  eigenvector  is 


v  = 


e3/2/4  +  0(e5/2) 
yfe  +  0(e3/2) 

1  oe/2  +  0(e2) 


(3.0.5) 


The  vector  obtained  in  exact  arithmetic  by  ignoring  the  first  equation,  with  the  approxi¬ 
mation  A  =  1,  is 

Ol/y/e 

x ^  =  0  , 

1 


but 


IKToA/)^1)^ 

lk(1)l|2 


4  o3e 


as 


0. 


□ 


In  [136,  pp. 319-321]  and[134],  Wilkinson  presents  a  similar  example  where  omit¬ 
ting  the  last  equation  results  in  a  poor  approximation  to  an  eigenvector.  His  example  matrix 
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is 


W2~i  = 


10  1 
1  9  1 

1  8 


0 


0 


1 

1  1 
1  O10 


(3.0.6) 


Letting  A  be  the  largest  eigenvalue  of  W^,  Wilkinson  shows  that  there  is  no  9-hgure  ap¬ 
proximation  A  to  A  for  which  the  exact  normalized  solution  of  the  first  20  equations  of 
(W21  oA I)x  =  0  is  anywhere  close  to  the  desired  eigenvector.  We  now  repeat  his  analysis. 
Suppose 

(J  <S>A/)r(r)  =  er, 


where  A  approximates  A  j.  Writing  er  in  terms  of  the  eigenvectors,  er  =  Y^i=i  ^ ivi ■>  we  get 


r<r)  =  (JoA/)-^, 

=  ^  „  [v  +Y—X3^Xv 

a  ■: A  \  Zj  Xt  oA 


(3.0.7) 


Fundamental  limitations  of  finite  precision  arithmetic  dictate  that  |A  oAj|  cannot  be  bet¬ 
ter  than  0(ne||/||),  in  general,  where  e  is  the  machine  precision.  Even  when  |A  oAj|  ~ 
ne||/||,  (3.0.7)  shows  that  x may  not  be  close  to  vf  s  direction  if  =  e*Vj  =  vj(r )  is  tiny. 
In  Example  3.0.1,  r  =  1  and  ~  0(e3/2)  while  in  Wilkinson’s  example,  r  =  n  =  21  and 
=  10-17.  As  a  result,  in  both  these  cases  the  natural  method  does  not  approximate  the 
eigenvector  well. 

Given  an  accurate  A,  (3.0.7)  implies  that  x^  will  be  a  good  approximation  to  Vj 
provided  the  rth  component  of  Vj  is  not  small.  Wilkinson  notes  this  in  [136,  p.318]  and 
concludes, 

‘Hence  if  the  largest  component  of  Uk  \vj  in  our  notation]  is  the  rth,  then  it 
is  the  rth  equation  which  should  be  omitted  when  computing  Uk-  This  result 
is  instructive  but  not  particularly  useful,  since  we  will  not  know  a  priori  the 
position  of  the  largest  component  of  Uk-  In  fact,  Uk  is  precisely  what  we  are 
attempting  to  compute!’ 


In  the  absence  of  a  reliable  and  cheap  procedure  to  find  r,  Wilkinson  compromised 
by  choosing  the  starting  vector  for  inverse  iteration  as  PLe ,  where  J  oA /  =  PLU  and  e 
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is  the  vector  of  all  l’s  (this  led  to  the  choice  made  in  EISPACK  [128]).  A  random  starting 
vector  is  used  in  the  LAPACK  implementation  of  inverse  iteration  [87]. 

In  this  chapter,  we  show  the  following 

1.  Sections  3.1  and  3.2  introduce  twisted  factorizations  that  provide  an  answer  to  Wilkin¬ 
son’s  problem  of  choosing  which  equation  to  omit.  This  is  due  to  pioneering  work  by 
Fernando,  and  enables  us  to  discard  LAPACK’s  random  choice  of  starting  vector  to 
compute  an  eigenvector  of  an  isolated  eigenvalue  [57,  117]. 

2.  In  Section  3.3  we  show  how  to  adapt  the  results  of  Sections  3.1  and  3.2  when  triangular 
factorizations  don’t  exist.  Some  of  the  material  presented  in  Sections  3. 1-3.3  has 
appeared  in  [117]. 

3.  Section  3.4  shows  how  to  eliminate  the  divisions  in  the  method  outlined  in  Section  3.2. 

4.  In  Section  3.5,  we  introduce  twisted  Q  factorizations  and  give  an  alternate  method 
to  compute  an  eigenvector.  We  also  show  how  the  perfect  shift  strategy  suggested  by 
Parlett  [110,  p.173]  can  be  made  to  succeed. 

5.  In  Section  3.6,  we  show  that  twisted  factorizations  may  also  be  used  to  detect  singu¬ 
larity  of  Hessenberg  and  dense  matrices. 

In  preparation  for  the  upcoming  theory,  we  state  a  few  well  known  results  without 
proof.  The  informed  reader  should  be  able  to  furnish  their  proofs  without  much  difficulty. 
We  expect  that  the  reader  knows  the  LDU  decomposition  and  theorem  concerning  existence 
and  uniqueness  of  triangular  factorization  and  the  expressions  for  the  pivots,  as  the  diagonal 
entries  of  D  are  often  called. 

Lemma  3.0.1  Let  .J  he  an  unreduced  tridiagonal  matrix  and  v  be  an  eigenvector.  Then  the 
first  and  last  components  of  v  are  non-zero. 

Lemma  3.0.2  Let  J  be  an  unreduced  normal  tridiagonal  matrix.  Then  every  eigenvalue  of 
J  is  simple. 

Lemma  3.0.3  Let  J  =  J  oAI  be  an  unreduced  singular  tridiagonal  matrix  where  A  is  an 
eigenvalue  of  J  and  v  is  the  corresponding  eigenvector.  Then  the  following  are  eguivalent. 


i.  J  admits  the  LDU  and  UDL  triangular  factorizations. 
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ii.  All  strictly  leading  and  trailing  submatrices  of  .J  are  nonsingular. 

Furthermore  when  J  is  normal,  the  following  is  eguivalent  to  the  above 
Hi.  No  component  of  v  is  zero. 

Proof.  The  latter  is  not  so  obvious  and  follows  from  the  fact  that  when  J  is  normal 

KOIV(A)  =  X1:’'"1(A)-X’'+1:"(A)  (3.0.8) 

where  xl:m{ll)  =  det (pi  xx,jl:m).  See  [110,  Section  7-9]  to  derive  the  above.  □ 


3.1  Twisted  Factorizations 


Despite  Wilkinson’s  pessimism,  we  ask  the  question:  can  we  find  reliable  indica¬ 
tors  of  the  sizes  of  various  components  of  a  normahzed  eigenvector  without  knowing  the 
eigenvector  itself?  We  turn  to  triangular  factorization  in  search  of  an  answer  and  examine 
the  LDU  decomposition  of  J  oA I.  In  this  representation,  both  L  and  U  have  l’s  on  the 
diagonal.  We  will  denote  the  ith  diagonal  element  of  D  by  D(i)  and  L(i  +  1,7),  U(i,i- \-  1) 
by  L(i)  and  U(i)  respectively. 

If  A  is  an  eigenvalue  of  J ,  the  following  theorem  implies  that  D(n)  must  be  zero 
in  exact  arithmetic. 


Theorem  3.1.1  Let  B  be  a  matrix  of  order  n  such  that  the  triangular  factorization 

B  =  LDU 


exists,  i.e.,  its  principal  submatrices  B1:t ,  i  =  l,2,...,nOl  are  nonsingular.  Then  if  B  is 
singular, 

D(n)  =  0. 


Proof.  The  expression 


D(n) 


det  (B) 
det(_B1:n_1) 


implies  that  if  B  is  singular,  D(n)  =  0.  □ 

We  now  examine  the  values  D(n)  can  take  when  A  is  not  an  eigenvalue.  Suppose 
A  approximates  A  j.  Since  /  oA/  =  LDU , 


U^D^L-1  =  (  J  oA/)-1 

=  e*n(  J  oAI)_1en 


=*►  efU^D^L^en 
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Since  L  and  U*  are  unit  lower  triangular,  Len  =  en  and  U*en  =  en.  Letting  J  =  V AV* ,  we 
get 


1 

D(n) 

1 

D(n) 


e*F(A  oA/)_1L*en 

K»|2  ,  \  -  Kfo)!2 


(3.1.9) 


where  ry(ra)  denotes  the  nth  component  of  ry.  (3.1.9)  implies  that  when  A  is  close  to  A  j 
and  Xj  is  isolated,  a  large  value  of  Vj(n )  results  in  D(n)  being  small.  But,  when  Vj(n )  is 
tiny,  D(n)  can  be  as  large  as  0(||  J||).  Thus  the  value  of  D(n)  reflects  the  value  of  the  last 
component  of  the  desired  eigenvector  in  addition  to  the  accuracy  of  A. 

We  can  consider  another  triangular  factorization  of  a  matrix,  i.e. ,  the  UDL  de¬ 
composition  that  may  be  obtained  by  taking  rows  in  decreasing  order  of  their  index.  To 
differentiate  this  process  from  the  standard  LDU  factorization,  we  make  a  notational  in¬ 
novation.  We  will  use  +  to  indicate  a  process  taking  rows  in  increasing  order  and  O  to 
indicate  the  process  going  in  decreasing  order,  i.e.,  LDU  will  henceforth  be  written  as 
L+D+U+  while  UDL  will  be  written  as  U-D_L_. 

By  repeating  the  analysis  that  led  to  (3.1.9),  it  can  be  shown  that  in  the  U-D_L_ 
factorization,  _D_(  1)  is  small  when  Vj(  1)  is  large,  but  not  otherwise.  Thus,  the  value  of 
_D_(  1)  mirrors  the  value  of  Vj(  1).  Besides  D+(n )  and  _D_(  1)  can  we  find  quantities  that 
indicate  the  magnitudes  of  other  components  of  the  desired  eigenvector? 

Natural  candidates  in  this  quest  are  elements  of  various  twisted  factorizations. 
Instead  of  completing  the  L+D+U+  factorization  of  a  matrix,  we  can  stop  it  at  an  interme¬ 
diate  row  k.  Now  we  can  start  the  U-D_L_  factorization  from  the  bottom  of  the  matrix, 
going  up  till  we  have  a  singleton  in  the  £;th  row,  which  we  will  denote  by  Such  a  twisted 
factorization,  with  n  =  5  and  k  =  3  is  shown  in  Figure  3.1. 

We  formally  define  a  twisted  triangular  factorization  of  a  tridiagonal  matrix  J  at 
twist  position  k  as 


J  =  NkDkNk, 


(3.1.10) 
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Figure  3.1:  Twisted  Triangular  Factorization  at  k  =  3  (the  next  elements  to  be  annihilated 
are  circled) 


where 


1 

Ml)  i 


Nk 


L+(k-  2)  1 

L+(k-  1)  1  U-(k) 

1  U-(k  +  1) 
1 


U-(n  -  1) 
1 


Nk  has  the  same  non-zero  structure  as  Nf  with  L+(  1), . . . ,  L+(k  Ol),  U-(k), . . . ,  h_(nOl) 
being  replaced  by  U+(  1), . . . ,  U+(k  Ol),  L_(k ), . . . ,  T_(n  Ol)  respectively, 


Dk  =  diag(_D+(l), .  ,.,D+(k  Ol), D_(k  +  1), . .  ,,D_(n)) 


and 

J  =  L+D+U+  =  U-D_L_ . 


The  perceptive  reader  may  already  have  grasped  the  significance  of  which  we 
now  explain. 


Theorem  3.1.2  (Double  Factorization)  Let  J  be  a  tridiagonal  n  X  n  complex  matrix 
that  permits  triangular  factorization  in  both  increasing  and  decreasing  order  of  rows: 

L+D+U+  =  J  =  U-D_L_. 


(3.1.11) 
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Consider  the  twisted  factorization  at  k  given  by  (3.1.10).  Then  for  1  <  k  <  n, 

Ik  =  D+(k)  +  D_(k)  O/fcfc  (3.1.12) 

and  if  .J  is  invertible, 

—  =  e*kJ-1ek.  (3.1.13) 

Ik 

Proof.  By  construction  of  the  twisted  factorization, 

Ik  =  <^Jk-i}kL+(k  ol)  +  Jkk  &Jk+i,kU-(k)  (3.1.14) 

—  ( Jkk  dk-l,kT-\-(k  771))  O  Jfcfc  4“  ( Jkk  ^~kdk-\-l,k  C  —  (&)) 

=  D+(k)  O/fcfc  +  D_(k) 

for  1  <  k  <  n.  Here  we  take  J^o  =  </o,i  =  Jn+i,n  =  Jn,n+ 1  =  0,  and  T+(0)  =  U-(n )  =  0. 
Since  Nkek  =  ek  and  e*kNk  =  e*k, 

e\J~Xek  =  e\NkxD-kxNkxek 

1 

7  k 

□ 

Note  that  =  _D+(n)  and  71  =  H_(  1).  The  above  theorem  shows  how  to  get  all 
possible  n  twisted  factorizations  at  the  cost  of  2 n  divisions.  The  alert  reader  would  have 
noted  that  the  above  theorem  makes  no  assumption  about  the  nearness  to  singularity  of  J. 

We  want  to  emphasize  that  in  the  theory  developed  here  the  existence  of  the  tri¬ 
angular  factorizations  (3.1.11)  is  not  restrictive.  Neither  is  the  occasional  requirement  that 
J  be  nonsingular.  In  fact,  the  singularity  of  J  is  desirable  in  our  application  of  computing 
an  eigenvector.  As  we  will  show  in  Section  3.3,  in  the  absence  of  these  requirements,  all 
the  theory  developed  in  this  section  and  the  next  is  easily  extended  if  00  is  added  to  the 
number  system. 

The  following  corollary  gives  alternate  expressions  for  jk. 
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Corollary  3.1.1  With  the  notation  of  Theorem  3.1.2,  for  1  <  k  <  n, 

[  D+(k)  O/fcfc  +  D_(k), 


Ik 


<T-kJk—i,k-L+(h  4=7*1)  T  Jkk  <T-kJk+i ,kU—(kf 
3-kJk,k—i  U+(k  441)  T  Jkk  44  J k ,k-\-i L  —  ( k ) , 
44</fc-|-i }k U—  ( k) , 

^Jk-i,kL+(k  44l)  +  D_(k). 


For  k  =  1  anrl  k  =  n  omit  terms  with  invalid  indices. 


Proof.  The  first  and  second  expressions  are  just  (3.1.12)  and  (3.1.14).  The  others  come 
from  rewriting  (3.1.14)  as 

3k  —  3F.Jk,k  —  lJk  —  l,k/kd-\-(k  4=7*1)  T  Jkk  3F.Jk,k-\-lJk-\-l,kfkJ—(k  T  1)  (3.1.15) 

and  using  Jk,k+ 1  =  U-(k)D_(k  +  1)  =  D+(k)U+(k)  etc.,  and  the  formula  for  the  backward 
pivots,  D_{k)  =  Jkk  &Jk,k+iJk+i,k/D-(k  +  1),  etc.  □ 

As  shown  below,  double  factorization  makes  available  the  so  called  Newton  correction. 

Corollary  3.1.2  Assume  the  hypothesis  (3.1.11)  of  Theorem  3.1.2,  and  let  J  be  nonsingu¬ 
lar.  Then 

Z^Tf1  =  Trace(/_1)  =  44^^,  (3.1.16) 

where  xid)  =  det (pi  44  J). 

Proof.  By  (3.1.13)  in  Theorem  3.1.2, 

Y^T1  =  Ye*J~le*  =  Trace(/-1)  =  44^^ 

smce 

x'it l)  _  ^ j  n 44 Ag)  ^  1 

x(t)  44 Aj)  .  p  44 A i 

□ 

Double  factorization  also  allows  a  wide  choice  of  expressions  for  det(T),  and  a  recurrence 
for  computing  7 i  that  involves  only  multiplications  and  divisions. 
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Theorem  3.1.3  Assume  the  hypothesis  of  Theorem  3.1.2.  Then  for  k  =  1  (omit 

invalid  indices), 


det(J)  =  -D+(l)  •  •  ■D+(k  ol)7fc-D_(A;  +  1)  •  •  -_D_(ra)  (3.1.17) 

and 

lkD-(k  +  1)  =  D+(k)jk+i-  (3.1.18) 


Proof.  From  (3.1.10), 

det(J)  =  det(iVfc)  det(Z)fc)  det(iVfc)  =  det (Dk), 

and  we  get  (3.1.17).  Applying  (3.1.17)  to  two  successive  s  leads  to  (3.1.18).  □ 

The  following  are  immediate  consequences  of  (3.1.17), 

Corollary  3.1.3  Assuming  the  hypothesis  of  Theorem  3.1.2, 

n 

lk  =  D+(k)  I] 

i=k-\- 1 
k  —  1 

=  D_(k)\{(D_(i)lD+(i)). 

i  =  l 

Corollary  3.1.4  Assuming  the  existence  of  the  twisted  factorization  (3.1.10), 

det  (J) 

lk  =  det(/1:fc_1)  det(/fc+1:n) ' 

Note  that  the  above  corollary  shows  that  =  0  if  <7  is  singular  and  admits  triangular 
factorizations  (3.1.11).  See  also  Theorem  3.1.1  which  proves  that  yn  =  0  when  J  is  singular. 

The  Double  Factorization  theorem  is  not  new.  In  1992,  in  [108],  Meurant  reviewed 
a  significant  portion  of  the  literature  on  the  inverses  of  band  matrices  and  presented  the 
main  ideas  in  a  nice  unified  framework.  The  inexpensive  additive  formulae  for  ( J~1)kk 
(Theorem  3.1.2  and  Corollary  3.1.1)  are  included  in  Theorem  3.1  of  [108],  while  our  Corol¬ 
lary  3.1.3  that  gives  the  quotient /product  form  of  (J~1)kk  is  given  in  Theorem  2.3  of  [108]. 
We  believe  that  such  formulae  have  been  known  for  quite  some  time  in  the  differential 
equations  community.  However,  these  researchers  were  not  interested  in  computing  eigen¬ 
vectors  but  in  obtaining  analytic  expressions  for  elements  of  the  inverse,  when  possible, 
and  in  the  decay  rate  in  terms  of  distance  from  the  main  diagonal.  Our  contribution  is 
in  recognizing  the  importance  of  twisted  factorizations  and  successfully  applying  them  to 
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solve  some  elusive  problems  in  numerical  linear  algebra.  We  will  show  some  of  these  ap¬ 
plications  in  this  thesis,  for  other  applications  see  [42,  115,  74].  In  addition  to  the  papers 
reviewed  in  [108],  twisted  factorizations  have  appeared  in  various  contexts  in  the  literature, 
see  [77,  94,  5,  58,  130,  133,  44].  For  a  brief  review,  see  Section  4.1  of  [117].  Twisted  factor¬ 
izations  have  also  been  referred  to  as  BABE  factorizations  (Begin,  or  Burn,  at  Both  Ends) 
in  [78,  57,  74], 


3.2  The  Eigenvector  Connection 


Given  an  eigenvalue  approximation  A  of .J,  we  can  compute  the  double  factorization 
of  J  oA /  by  Theorem  3.1.2.  In  this  section,  we  see  how  double  factorization  can  be  used  to 
find  a  “good”  equation  to  omit  when  solving  the  system  ( J  oA I)x  =  0,  thereby  obtaining 
a  good  approximation  to  the  desired  eigenvector.  Some  of  the  results  of  this  section  have 
appeared  in  [117]. 

Since  jf1  =  e^(.J  oA/)_1ey  if  we  let  J  =  VAV*,  we  get  an  expression  for  y^  that 
is  similar  to  (3.1.9), 


—  =  efcG(AoA/)-1Wefc, 

7  k 

_j_  _  \vj(k)\2  +  \vi(k) | 2 

Ik  Xj  oA  A i  oA 


(3.2.19) 


Thus  when  A  is  close  to  an  isolated  Xj  and  Xj  is  isolated,  the  value  of  y^  reflects  the  value 
of  Vj(k ),  i.e,  an  above  average  value  of  Vj(k)  leads  to  a  tiny  value  of  y^  while  a  small  Vj(k) 
implies  a  large  y^.  We  now  make  this  claim  more  precise. 


Lemma  3.2.1  Let  J  -^pL  be  a  normal,  unreduced  nonsingular  tridiagonal  matrix  that  sat¬ 
isfies  the  hypotheses  of  Theorem  3.1.2  for  all  p  in  some  open  interval  containing  the  eigen¬ 
value  Xj.  Let  jk  =  lk{h)  be  as  in  (3.1.13),  for  each  k.  As  p  Xj, 

_!  4^  \vfik)\ 2,  k  =  l,2,...,n.  (3.2.20) 

Z^m  =  1  7m 

Proof.  For  p  fi  Xj, 
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From  (3.1.13)  in  Theorem  3.1.2,  yfc  1 
as  fj,  o->  A j, 


■J 


-l 


kk 


Since  A  j  is  simple  by  Lemma  3.0.2, 


(A  j  oaO  (J  pi 


-l 


O^ 


vivv 


(A j  Up) Ik  40 

n 

(X3  o/x)  J2  7m1  40 

m  —  1 


Vj(k) |2,  k  =  1,  2, . . . ,  n 

Ml2  =  !• 


□ 

The  following  theorem  replaces  the  limits  of  Lemma  3.2.1  with  error  bounds.  It 
implies  that  when  p  is  sufficiently  close  to  A  j  then  one  of  the  y^’s  must  be  small.  Note  that 
in  the  following  theorem,  the  matrix  need  not  be  tridiagonal  and  the  eigenvalues  may  be 
complex. 


Theorem  3.2.1  Let  J  <PrpI  be  a  normal,  invertible  matrix,  and  let 

if 1  =  et(  J  for  k  =  1,  2, . . . ,  n. 


Then  if  Vj(k )  f  0, 

lk  =  (3.2.21) 

Here  A\  is  a  weighted  arithmetic  mean  of  |  f  j},  0  <  |«4i|  <  \Xj  <Prp\/ gap(p),  where 

gap(p)  =  min^j  \Xi<Prp\.  Furthermore,  if  Xj  is  isolated  enough,  i.e., 


I A  j  IT  ^  | 
gap{  p) 


<  —  • -  where  M  >  1 

M  n  Ol 


(3.2.22) 


then  for  k  such  that  \vj(k)\  >  1/^/n, 


Proof. 


ItatI  < 


|  Ay  OAt|  M 
\v3{k)\ 2  '  Mol 


< 


n\\j  0/u|  • 


M 

Mol 


(3.2.23) 


Extract  the  jth  term  to  find 


lk1 


lk 1  =  ek{J  Uhl)  k , 

i=i 


I^WI2 

Aj  O/U 


i  +  E 

Oi 


Aj  OjA 

A«-  O/U  / 


(3.2.24) 
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Since 


£ 


Vj(k) 

Vj(k) 


1Q|  v3(k)\2 

\vj(k)\2 


the  term  in  (3.2.24)  may  be  written  as 

(\vj(k)\~2  ^>l)  Ai,  1-4.1 1  <  |Aj  44-/i|/gap(/i), 

where 

4i  =  V  wt  ( ,  l  =  Vuy,  wt  >  0,  gap(/i)  =  min  |A;  0/r|, 

“T  rrr 


to  yield  the  equality  in  (3.2.21).  If  (3.2.22)  holds,  then 


It*  I  < 


I  A j 

\vj(k)\2 


1^(\vi(k)\  2  Ol) 


1 

M  ■  (n  Ol) 


1 


For  k  such  that  \vj(k)\  >  1/y/n, 


It*  I  < 


\^3  1  ,,  J_ 

\vj(k)\ 2  [  M_ 


□ 

In  cases  of  interest,  | Xj  0/n|/gap(/r)  =  0(e)  and  hence  M  >>  1.  Thus  the  factor  Mj(M  Ol) 
in  (3.2.23)  is  «  1. 

The  fohowing  theorem  shows  a  way  to  compute  the  vector  z  that  satishes  all  the 
equations  of  (J  o-(j,I)z  =  0  except  the  kth. 


Theorem  3.2.2  Let  J  =  TO/u/  be  an  unreduced  tridiagonal  matrix  that  permits  the  twisted 
factorization  (3.1.10)  for  a  fixed  value  of  k.  Then  the  system 

(  J  <$pLl)z^  =  ^kek  (3.2.25) 

has  a  unique  solution  given  by 

z{kHk)  =  1, 

z{%)  =  vU+J+1z(k)(j  +  1),  j  =  k&  1,...,1, 

z(k\i)  =  1),  i  =  k+l,...,n. 

Proof.  Since  Nkek  =  ek  and  Dkek  =  7fcefc, 

(  J  =  jkek  =y  Nkz =  ek 
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from  which  the  result  follows.  □ 

Note  that  when  J  is  unreduced,  the  above  theorem  precludes  z^k)  from  having  a  zero  entry. 
This  is  consistent  with  Theorem  3.0.3.  A  benefit  of  computing  z^k)  is  the  availability  of  the 
Rayleigh  Quotient. 


Corollary  3.2.1  Let  J  O/u/  satisfy  the  hypothesis  of  Theorem  3.1.2,  and  z  =  z^k)  be  as 
in  (3.2.25)  for  1  <  k  <  n.  Then  the  Rayleigh  Quotient  of  z  is  given  by 


Proof.  By  (3.2.25), 


z*Jz 

z*z 


z*(.J  &pL)z  =  ~/kz*ek  =  7fc 


(3.2.26) 


since  z(k)  =  1.  The  result  (3.2.26)  follows. 

Since  ||z(fc)||  >  1,  Theorem  3.2.1  implies  that  the  residual  norm 

\\(J^Tl)z^\\  =  \lk\ 

1 1  z(k)  1 1  ||^(fc)|| 

is  small  when  p  is  a  good  approximation  to  an  isolated  eigenvalue  A,  and  k  is  chosen 
appropriately.  The  following  theorem  gives  a  better  bound  on  min^  iTfcl/H-^^H- 


(3.2.27) 


Theorem  3.2.3  Let  .J  O/u/  be  a  normal,  invertible  matrix,  and  let 

(j  <$pL)z^  =  ekjk,  for  k  =  1,2,...,; 


,  n. 


Then  if  Vj(k)  /  0? 


\ik\ 

\z(k)\\ 


< 


\h  r,  .  /,  /,  m-9  ..  .5  ,  1  -1/2 
\vAk)\ 

\p  oA j | 


1  +  ( \v3{k)\  Ol)  A 2 


\vj(k)\ 

Here  A2  is  a  weighted  arithmetic  mean  of 

Proof. 


<  y/n  \p  oAj|  for  at  least  one  k. 


(3.2.28) 


/i-A, 


,  if -  j\,  0  <  A2  <  [|Aj  &p\ /gap(p)]2. 


M  - 


\-i. 


Ak)  ||2  _ 


{.J  *23pL^  kik^jk, 

erfetViA^pLr'iA^pL)-1 

n  Wk)\2 


V*ek 


=  hk\2J2 


\T  ^ 
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Extract  the  jth  term  to  find 


where 


^)||\2 

hkT) 

ItatI 

\\z(k)  || 


\n  oA j  |  J 


\/i  oA j 

\vi(k)\ 


1  + 


i  +  E 

Oi 


Vj(k) 

Vj(k) 


vj{k)\  2Ol)A 


H  oAj 
fi  O'  A?' 


2 


A.2 


Oi 


H  oAj  2 
M  44A8 


1  =  X  wi  -  °’  0  <  Al2  <  [|Aj  O^l/gap^)]2. 
Oi 


Since  ( \vj(k)\~ 2  Ol)A2  >  0,  (3.2.28)  follows  easily  from  the  above.  □ 

The  above  theorems  suggest  a  way  to  compute  a  vector  z^  such  that  the  residual 
norm  (3.2.27)  is  small.  In  1995,  Fernando,  in  an  equivalent  formulation,  proposed  choosing 
the  index  for  which  \jk\  is  minimum,  say  r,  and  then  solving  (  J ^fil)z^  =  7 rer  to  compute 
an  approximate  eigenvector  z(r\  See  [57]  for  his  subsequent  work.  Earlier,  in  1985,  Godunov 
and  his  collaborates  proposed  a  similar  but  more  obscure  method  for  obtaining  a  provably 
accurate  approximation  to  an  eigenvector  by  ‘sewing’  together  two  “Sturm  Sequences”  that 
start  at  either  end  of  the  matrix.  See  [64]  and  [63]  for  their  work,  and  Section  4.2  of  [117] 
for  interpretation  in  our  notation.  Fernando’s  approach  leads  to  the  following  algorithm 


Algorithm  3.2.1  [Computing  an  eigenvector  of  an  isolated  eigenvalue.] 


1.  Compute  J  O/u/  =  L+D+U+  =  U-D_L_. 

2.  Compute  7^  for  k  =  1,  2, . . . ,  n  using  the  expressions  in  Corollary  3.1.1  or  (3.1.18)  in 
Theorem  3.1.3  and  choose  the  index  r  where  1 7^ |  is  minimum. 

3.  Form  the  vector  z^  as  in  Theorem  3.2.2. 


□ 


Note  that  z^  is  formed  by  multiphcations  only,  thus  promising  greater  accuracy 
and  a  more  favorable  error  analysis  than  standard  methods  that  involve  additions.  Us¬ 
ing  (3.2.23)  and  the  fact  that  ||^^r-*||  >  1,  the  above  algorithm  bounds  the  residual  norm 
by 


||(J  ■»pJ)3l'>|| 

IIOI 


<  n|AjO/i| 


M 

Mo r 


(3.2.29) 
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This  solves  Wilkinson’s  problem  of  choosing  which  equation  to  omit,  modulo  the  mild 
assumption  (3.2.22)  about  the  separation  of  A j.  As  we  shall  emphasize  in  later  chapters, 
we  will  use  Algorithm  3.2.1  only  when  A  j  is  sufficiently  isolated. 

We  have  noted,  and  so  has  Jessie  Barlow  [8],  that  a  simple  recurrence  will  yield 
all  values  of  ||z(fc)||  for  0(n)  operations.  Consequently  it  would  be  feasible  to  minimize 
iTfcl/lM^II  instead  of  |7^|  to  obtain  a  possibly  smaller  residual  norm.  At  present  we  feel 
that  the  extra  expense  is  not  warranted.  The  bound  given  by  (3.2.28)  is  certainly  better 
than  the  above  bound  (3.2.29).  However,  the  latter  bound  is  much  too  pessimistic  since 
||2:(,’)||  can  be  as  large  as  ffin  when  all  eigenvector  entries  are  equal  in  magnitude. 

3.3  Zero  Pivots 

We  now  show  that  the  procedure  to  compute  an  eigenvector  suggested  by  the 
previous  section  needs  to  be  modified  only  slightly  when  J  =  J  o  jil  does  not  admit 
the  triangular  factorizations  (3.1.11)  in  Theorem  3.1.2.  We  will  continue  to  assume  exact 
arithmetic  —  the  effect  of  roundoff  errors  in  a  computer  implementation  is  examined  in  the 
next  chapter. 

Triangular  factorization  is  said  to  fail,  or  not  exist,  if  a  zero  ‘pivot’,  D+(j )  or 
D_(j)  is  encountered  prematurely.  The  last  pivot  is  allowed  to  vanish  because  it  does  not 
occur  as  a  denominator  in  the  computation. 

One  of  the  attractions  of  an  unreduced  tridiagonal  matrix  is  that  the  damage 
done  by  a  zero  pivot  is  localized.  Indeed,  if  ±oo  is  added  to  the  number  system  then 
triangular  factorization  cannot  break  down  and  the  algorithm  always  maps  J  into  unique 
triplets  T_|_ ,  D+,  U+  and  C/_,D_,T_.  There  is  no  need  to  spoil  the  inner  loop  with  tests. 
It  is  no  longer  true  that  L+D+U+  =  J  =  U-D_L_  but  equality  does  hold  for  all  entries 
except  for  those  at  or  adjacent  to  any  infinite  pivot.  IEEE  arithmetic  [2]  allows  computer 
implementations  to  handle  ±oo  and  take  advantage  of  this  feature  of  tridiagonals. 

We  now  show  that  proceeding  thus,  it  is  meaningful  to  pick  \~fr\  =  min^  |7^|,  omit 
the  rth  equation  and  solve  for  even  when  triangular  factorization  breaks  down.  We  first 
handle  the  case  when  J  is  singular. 

When  J  is  singular,  it  may  appear  that  any  equation  can  be  omitted  to  solve 
Jz  =  0  by  the  method  suggested  in  Theorem  3.2.2.  However,  zero  entries  in  z  do  not  allow 
such  a  simple  solution. 
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Theorem  3.3.1  Let  .J  be  rax  n,  tridiagonal,  unreduced,  and  singular.  For  each  k,  1  <  k  < 
n,  «/1:fc_1  is  singular  if,  and  only  if,  Jk+1:n  is  singular.  They  are  singular  if,  and  only  if, 
z(k)  =  0  whenever  Jz  =  0. 


Proof.  Write 


and  partition  Jz 


z  = 


z{k) 


z_ 


0  conformably.  Thus 

J1^  1z+  +  .Jk-i:kz(k)ek-i  =  0, 

eiJk+i,kz(k)  +  Jk+1:nz_  =0, 


(3.3.30) 

(3.3.31) 


and  z+(  1)  7^  0,  z_(n)  0  by  Lemma  3.0.1. 

If  z(k)  =  0  then  (3.3.30)  shows  that  z+(  0)  is  in  s  null  space  and  (3.3.31) 

shows  that  z_(^  0)  is  in  Jk+1:n,s  null  space.  So  both  matrices  are  singular. 

Now  consider  the  converse,  z(k)  0.  Since  J  is  unreduced,  rank(J)  =  n  Ol  and 
its  null  space  is  one  dimensional.  So  the  system 


Jz  =  0,  z(k)  =  1, 

has  a  unique  solution.  Thus  both  (3.3.30)  and  (3.3.31)  are  inhomogeneous  equations  with 
unique  solutions.  Thus  «/1:fc_1  and  Jk+1:n  are  invertible.  □ 

Clearly  when  z(k)  =  0,  Theorem  3.2.2  should  not  be  used  to  compute  z.  When 
z(k)  =  0,  Theorem  3.3.1  implies  that  D+(k  Ol)  =  D_(k  +  1)  =  0,  and  the  formulae  for  y^ 
in  Corollary  3.1.1  give  y^  =  oo  +  oo  or  oo  Ooo.  When  z(k)  0,  Corollary  3.1.4  implies  that 
y k  =  0.  By  Lemma  3.0.1,  z(  1)  0  and  z(n)  0,  and  consequently  yx  =  yn  =  0.  Thus  the 

index  r  where  |y^|  is  minimum  is  such  that  z(r)  0.  To  account  for  possible  breakdown  in 
triangular  factorization,  the  method  of  Theorem  3.2.2  may  be  modified  slightly. 

Algorithm  3.3.1  [Vectors  with  zeros.] 


z(r) 

Z(J) 


z[i) 


GU+ti)zti  +  !)>  ZU  +  !)  +  0, 

<4  Jj+i,j+2  z(j  +  2)/ Jj+ij),  otherwise 

<yi_(i<yl)/;(j<yl),  2(iol)  /  0, 

1,1-2  z{i  02)//8_i j8),  otherwise 


,  j  =  r  Ol, . . . ,  1 

i  =  r  +  1, . . . ,  n. 


□ 


51 


Thus  the  case  of  singular  J  is  handled  correctly.  Note  that  the  multiplicative 
recurrence  given  by  (3.1.18)  breaks  down  when  computing  Xk  as 


7  k  =  7fc+i 


D+(k) 

D_(k  +  1) 


if  D+(k)  =  0,  D_(k  +  2)  =  0  and  Xk+i  =  00. 

Now  we  consider  the  case  when  J  is  nonsingular  and  triangular  factorization  breaks 
down.  When  D+(k  Ol)  or  D_(k  +  1)  equals  zero,  the  expressions  in  Corollary  3.1.1  imply 
that  =  00.  Note  that  both  D+(k  Ol)  and  D_(k  +  1)  cannot  be  zero,  otherwise  J  would 
be  singular.  Unlike  the  singular  case,  it  is  possible  that  all  1 7^ |  are  00.  We  now  show  that 
this  occurs  only  in  very  special  circumstances.  We  recall  that  jf1  =  {■J~1)kk- 


Lemma  3.3.1  Let  J  be  a  complex  tridiagonal  matrix  of  order  n  with  diag(J)  =  0.  Then 

if  n  is  odd,  Xn(p)  =  XXXn(xXp),  for  all  p  £  C,  and 
if  n  is  even,  Xn(p)  =  Xn(xXp),  for  all  p  £  C. 

where  Xiih)  =  det (pi  xxJ1:t). 


Proof.  If  diag(T)  =  0, 


Xi(,l)  —  PXi  —  1  (aO  XXJi^i  —  1  Ji  —  1  ,iXi  —  2  (a0  • 

We  prove  the  result  by  induction  on  i.  Xiil1)  =  P  is  an  odd  function,  while  X2 (p)  = 
p2  is  even  and  the  result  holds  for  the  base  cases.  In  the  inductive  step,  if  i  is 

odd 


Xi^XXp')  —  (XXp'jXi  —  1  XX  —  \  Ji  —  i^iXi  —  2 

—  XXpXi— i(a0  T  Ji,i—lJi—l,iXi—2^p}  —  XXXi(  p)  • 

Similarly,  if  i  is  even 

Xi(,XXp)  —  (XXp)Xi  —  l(,XXp)XXJi,i  —  lJi  —  l,iXi  —  2(XXp)  —  X«(a0" 


□ 
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Nonsingular  J  that  have  zero  diagonals  have  a  special  pattern  of  zeros  and  nonze¬ 
ros,  as  illustrated  by  the  following  example. 
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Theorem  3.3.2  Let  J  be  a  nonsingular  tridiagonal  matrix.  Then 

diag(J)  =  0  O  diag(/_1)  =  0. 

Proof.  In  this  proof,  we  will  use  the  famous  Cauchy-Binet  formula 

J  ■  adj(  J)  =  det(  J)  •  I  (3.3.33) 


where  adj(J)  is  the  adjugate  of  J  and  is  the  transpose  of  the  matrix  of  cofactors  [129,  p.402], 
to  get  expressions  for  elements  of  J-1.  By  (3.3.33), 

det(  J1:*_1)  det(  J*+1:?l) 


(J 


-!• 


det(J) 


(3.3.34) 


Suppose  diag(J)  =  0.  The  nonsingularity  of  J,  and  Lemma  3.3.1  imply  that  n 
must  be  even.  Since  one  of  or  Jt+1:n  must  be  of  odd  order,  Lemma  3.3.1  and  (3.3.34) 

implies  that  (  J~1)u  =  0  for  1  <  i  <  n. 

Now  suppose  that  diag(/_1)  =  0.  In  this  case,  the  key  fact  needed  for  the  proof 
is  that  every  leading  (and  trailing)  principal  submatrix  of  J  of  odd  dimension  is  singular. 
This  behavior  can  be  observed  in  (3.3.32).  We  now  prove  this  claim. 

First,  we  observe  that  no  two  consecutive  leading  submatrices  can  be  singular  since 
otherwise,  by  the  three-term  recurrence 


det(J1:*)  =  Ja  det(J1:*_1)  O  J;_iy  J;y_i  det(J1:*-2),  (3.3.35) 


J  would  be  singular.  Similarly  no  two  consecutive  trailing  submatrices  can  be  singular. 
Now 


(J_1)n  =  0  =s>  det(J2:n)  =  0  by  (3.3.34), 
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and  hence,  det(«73:n)  7^  0.  Consequently, 

<y_1)22  =  0  =8  det(«71:1)  =  0,  again  by  (3.3.34). 

Similarly  by  setting  i  =  3,4  and  so  on  in  (3.3.34),  we  can  conclude  that  the  principal 
submatrices  J4:n,  J1:3,  J6:n,  J1:5, . . .  must  be  singular.  In  particular,  all  leading  principal 
submatrices  of  odd  dimension  are  singular.  Now,  if  n  is  odd  then  n  02  is  also  odd  and  by 
the  above  reasoning,  J1:n~2  is  singular.  However,  by  setting  i  =  n  in  (3.3.34),  we  see  that 

det(/1:n_1)  =  0, 

but  no  two  consecutive  principal  submatrices  of  a  nonsingular  J  can  be  singular.  Hence  n 
cannot  be  odd,  and  must  be  even. 

When  i  is  odd,  the  submatrices  J1:t  and  J1:t~ 2  are  singular  and  by  (3.3.35),  we 
get 

0  =  det(  J1:*)  =  Ja  det(  J1:*_1), 

and  Ja  must  be  0  since  det(/1:*_1)  7^  0  in  this  case.  Similarly,  since  n  is  even  and  by 
considering  the  trailing  recurrence 

det (Jt:n)  =  Jt)tdet(.P+1:n)  det(«T+2:n), 

we  can  conclude  that  Ja  =  0  for  even  i.  □ 

The  pattern  of  zeros  and  nonzeros  in  J-1  that  is  used  to  prove  the  above  result 
may  also  be  deduced  by  using  the  following  lemma  which  states  that  the  upper  (lower) 
triangular  part  of  T_1  is  a  rank-one  matrix.  This  result  has  appeared  in  [4,  14,  83,  78,  42]. 


Lemma  3.3.2  Let  J  be  a  nonsingular  unreduced  tridiagonal  matrix  of  order  n.  Then  there 
exist  vectors  x,  y,  p  and  g  such  that 


( J  1)ik  = 


XiVk,  i  <  k, 
PiQk,  i  >  k. 


The  following  corollary  says  that  the  spectrum  of  tridiagonals  that  have  a  zero 
diagonal  is  symmetric. 


Corollary  3.3.1  Let  J  be  a  tridiagonal  matrix  such  that  diag(J)  =  0.  Then  if  A  is  an 
eigenvalue  of  J ,  so  is  oA.  If  v  and  w  are  the  corresponding  normalized  eigenvectors,  then 
|u(i)|  =  |tr(i)|  for  1  <  i  <  n. 
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Proof.  Lemma  3.3.1  implies  that  eigenvalues  occur  in  ±  pairs.  The  corresponding  eigen¬ 
vector  entries  are  equal  in  magnitude  by  (3.0.8)  since  X^A)  =  ox^oA).  □ 

We  have  shown  that  for  all  |yfc|  to  be  oo,  all  eigenvalues  of  J  =  «/  O/u/  must 
occur  in  ±  pairs.  Thus  fj,  is  equidistant  to  the  two  closest  eigenvalues.  Note  that  this 
result  is  consistent  with  Theorem  3.2.1.  We  will  be  careful  to  avoid  this  situation  in  our 
application.  As  we  stated  in  the  previous  section,  we  will  use  Algorithm  3.2.1  to  compute 
an  eigenvector  only  when  the  corresponding  eigenvalue  is  “sufficiently  isolated”  (we  make 
the  meaning  of  “isolated”  more  precise  in  the  upcoming  chapters).  Then,  as  Theorem  3.2.1 
shows,  at  least  one  y^  must  be  small  if  fj,  is  close  to  the  eigenvalue.  Breakdown  in  the 
triangular  factorization  does  not  hinder  finding  such  a  y^.  The  corresponding  eigenvector 
is  approximated  well  by  Algorithm  3.3.1  given  earlier  in  this  section. 

We  now  show  how  the  ideas  developed  in  this  chapter  enable  us  to  correctly  handle 
the  matrix  T  of  (3.0.4). 

Example  3.3.1  [Minimum  | y^, |  leads  to  a  good  eigenvector  approximation.]  Con¬ 
sider  T  as  in  (3.0.4)  of  Example  3.0.1,  and  the  eigenvalue  approximation  A  =  1  so  that  the 
error  in  the  eigenvalue  is  e  +  0(e2). 

4/(o4  +  3e)  0  yVF/(yyl  +  3e) 

(To/)-1  =00  1/Ve 

0\/e/(o4  +  3e)  1/y/e  (ol  +  lOe  o5e2)/e(ol  +  3e) 

and  since  y)!1  =  (T-1)fcfc, 

7i  ss  Ol,  |y2|  =  oo  and  y3  ss  e. 

Since  |y3 1  is  the  minimum,  our  theory  predicts  that  z  =  (T  o/)-1e3  is  a  good  eigenvector 
approximation.  Indeed, 

e3/2/4  +  0(e5/2) 

Ve  +  0(e3/2) 

1  oe/2  +  0(e2) 

and  from  (3.0.5)  we  see  that  each  entry  of  the  above  vector  agrees  with  the  corresponding 
eigenvector  entry  in  all  its  figures,  i.e. ,  the  relative  error  in  each  entry  is  0(e).  □ 
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3.4  Avoiding  Divisions 


On  modern  computers,  divisions  may  be  much  more  costly  than  multiplications. 
For  example,  on  an  IBM  RS/6000  computer,  a  divide  operation  takes  19  cycles  while  a 
multiphcation  can  be  done  in  1  cycle.  In  this  section,  we  see  how  to  eliminate  all  divisions 
from  Algorithm  3.2.1.  This  may  result  in  a  faster  computer  implementation  to  compute 
an  eigenvector.  However,  elimination  of  divisions  results  in  a  method  that  is  susceptible  to 
over /underflow  and  requires  some  care  to  ensure  correctness.  This  section  was  inspired  by 
Fred  Gustavson’s  observation  that  n  divisions  are  sufficient  for  a  related  application  [74]. 

In  this  section,  we  will  assume  that  T  is  the  given  unreduced  real,  symmetric  tridi¬ 
agonal  matrix  with  diagonal  elements  oq,  oq,  ■  ■  . ,  an  and  off-diagonal  elements  /3i, . . . , 
Extensions  to  the  normal  case  are  trivial. 

The  crucial  observation  is  that  the  rth  column  of  ( T  O/u/)-1  may  be  written  as 


X1:nwr  = 


(1  '  Xr  +  1:n){Pl  '  '  ’flr-l) 


(x1:r~2  ■  Xr  +  1:n)Pr- 1 
(x1:r_1  •  xr+1:") 

(x1:r_1  •  Xr  +  2:n)ftr 


(x1:r_1  •  1)(A-  •  •  '  fin-l) 


(3.4.36) 


for  r  =  1,2, . . . ,n,  where  x!''J  =  =  det (fil  <S>T,:J),  taking  xn+1:n(il)  =  X1:°(m)  =  1- 

The  above  is  easily  derived  from  the  Cauchy-Binet  formula  (3.3.33)  for  the  entries  of  the 
inverse  of  a  matrix.  Emphasizing  that 


[<r«"/rlL 


(ji)  ■xk+1:n(v) 


(3.4.37) 


we  give  an  alternate  method  to  compute  an  eigenvector  of  an  isolated  eigenvalue. 


Algorithm  3.4.1  [Computing  an  eigenvector  with  no  divisions.] 

1.  Compute  x1:1(/i),  x1:2(/i), . . . ,  x1'n~1(l ’-1)  using  the  3-term  recurrence 


X1:t(p)  =  (pXXai)X1:t  1(n)XX(3f_1x1:t  2(/u),  X1:°  =  1,  A)  =  0. 
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2.  Compute  xn:n[ji),  xn:n  1(/u), . .  .  ,xn:2(/w)  using  the  3-term  recurrence 

X*':»  =  (M  ^)x'+1:"(/i)  o/382X8+2:"(m),  X"+1:"  =  1,  Pn  =  0. 

3.  Compute 

^k  =  X1:k-1(v)-Xk+1:n(v)  for  fc=l,2,...,n 

and  choose  the  index  r  where  |A^|  is  maximum.  In  exact  arithmetic,  by  (3.1.13) 
and  (3.4.37),  this  index  r  is  identical  to  the  one  found  in  Step  2  of  Algorithm  3.2.1. 

4.  Form  the  vector  wr  ■  x1:n(/u)  by  multiplying  the  appropriate  x’s  an(i  /3’s,  as  given 

in  (3.4.36).  □ 

The  total  cost  involved  in  the  above  algorithm  is  8 n  multiplications  and  3 n  ad¬ 
ditions.  Some  multiplications  may  be  saved  when  computing  more  than  one  eigenvector 
by  forming  and  saving  the  products  of  /3’s  that  are  used  more  than  once.  There  is  also 
some  cost  involved  in  monitoring  and  handling  potential  overflow  and  underflow.  The  cost 
of  the  above  algorithm  should  be  compared  to  that  of  Algorithm  3.2.1  which  requires  2 n 
divisions,  3 n  multiphcations  and  4n  additions  if  an  additive  formula  to  compute  Xk  is  used. 
The  vector  output  by  both  these  algorithms  may  be  normalized  at  a  further  cost  of  1  square 
root,  1  division,  2 n  multiplications  and  nOl  additions. 

However,  as  is  well  known,  the  quantities  x1:*(m)  an(f  Xt:n(il)  can  grow  and  decay 
rapidly,  and  hence  the  recurrences  to  compute  them  are  susceptible  to  severe  overflow  and 
underflow  problems.  To  overcome  this,  these  quantities  need  to  be  monitored  and  rescaled 
when  approaching  overflow  or  underflow.  Some  modern  computers  allow  such  monitoring  to 
be  done  in  hardware  but  alas,  this  facility  is  not  presently  visible  to  the  software  developer 
who  programs  in  a  high  level  language  [92]. 

3.4.1  Heuristics  for  choosing  r 

In  Algorithm  3.2.1,  the  2 n  quantities  _D+(  1), . . . ,  D+(n)  and  _D_(  1), . . . ,  D_(n )  are 
computed  to  find  the  minimum  among  all  the  Xk  s.  Once  the  right  index  r  is  chosen,  half  of 
these  quantities  are  discarded  and  the  other  half  used  to  compute  the  desired  eigenvector 
approximation.  The  situation  is  similar  in  Algorithm  3.4.1. 

If  we  have  a  reasonable  guess  at  a  “good”  index  r,  we  can  check  the  value  of  7 r 
by  only  computing  the  n  quantities  _D+(  1), . . . ,  D+(r  Ol),  D_(r  +  1), . . . ,  D_(n )  and  7 r.  If 
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7r  is  small  enough  we  can  accept  r,  otherwise  we  can  compute  all  the  7’s  and  choose  the 
minimum  as  before.  In  case  the  guess  is  accurate,  we  can  save  approximately  half  the  work 
in  the  computation  of  the  eigenvector.  We  now  examine  the  proof  of  the  Gerschgorin  Disk 
Theorem  that  suggests  a  heuristic  choice  of  r. 

Theorem  3.4.1  [Gerschgorin  Disk  Theorem].  Let  B  be  a  matrix  of  order  n  with  an 
eigenvalue  A.  Then  A  lies  in  one  of  the  Gerschgorin  disks,  i.e.,  3r  such  that 

|  A  Brr  |  N  'y  '  |-£?rfc| 

k^r 

Proof.  Let  v  be  the  eigenvector  corresponding  to  A,  and  let  |w(r)|  =  |M|oo-  Consider  the 
rth  equation  of  Bv  =  An, 

A v(r)  = 

=y  |  A  —  Brr  |  < 


Brrv(r )  +  E  Brkv(k) 

k^r 

E  \Brk\\v(k)/v(r)\ 

k^r 


from  which  the  result  follows.  □ 

Thus  if  the  kth  entry  of  v  is  the  largest,  then  A  must  lie  in  the  kth  Gerschgorin 
disk.  Consequently,  if  A  lies  in  just  one  Gerschgorin  disk  then  the  corresponding  entry  of  v 
must  be  the  largest.  However  A  may  lie  in  many  Gerschgorin  disks  and  there  is  no  guarantee 
that  the  corresponding  entries  of  v  are  large.  But  we  may  use  the  following  heuristic 

Pick  r  such  that  A  lies  in  the  rth  Gerschgorin  disk  with  minimum  radius  \Brk\  ■ 

Such  a  heuristic  is  inexpensive  to  compute  and  can  lead  to  considerable  savings 
on  large  matrices,  especially  ones  that  are  even  mildly  graded.  Such  matrices  seem  to  arise 
in  many  real  life  applications.  Note  that  the  above  heuristic  gives  the  correct  choice  of  r  in 
the  example  tridiagonals  of  (3.0.4)  and  (3.0.6). 

We  do  not  make  any  claim  about  the  optimality  of  the  above  heuristic,  and  it  may 
be  possible  to  obtain  better  ones.  The  purpose  of  this  section  was  to  furnish  the  idea  of 
heuristically  picking  r  and  checking  it  in  order  to  potentially  save  half  the  computations. 

3.5  Twisted  Q  Factorizations  —  A  Digression* 

In  this  section,  we  introduce  twisted  QR-type  factorizations.  The  reader  who  is 
pressed  for  time  and  is  primarily  interested  in  seeing  how  to  adapt  inverse  iteration  to 
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Figure  3.2:  Twisted  Q  Factorization  at  k  =  3  (the  next  elements  to  be  annihilated  are 
circled):  forming  Nk. 

compute  mutually  orthogonal  eigenvectors  may  skip  to  the  next  chapter. 

We  define  a  twisted  orthogonal  or  Q  factorization  at  position  k  as 

J  =  NkQ*k  (3.5.38) 

where  Qk  is  an  orthogonal  matrix  and  Nk  is  such  that  Nk(i,j)  =  0  for  i,j  such  that  j  >  i 
when  i  <  k  and  k  <  j  <  i  when  i  >  k.  Note  that  Nk  is  a  “permuted”  triangular  matrix, 
since  there  exists  a  symmetric  permutation  of  Nk  that  is  a  triangular  matrix.  Figure  3.2 
illustrates  how  to  compute  such  a  factorization  —  it  may  be  obtained  by  starting  an  LQ 
factorization  from  the  left  of  the  matrix  stopping  it  at  column  k,  and  then  doing  an  RQ 
factorization  from  the  right  of  the  matrix  till  there  is  a  singleton  in  the  kth  column  (note 
that  we  are  doing  column  operations  here  in  contrast  to  row  operations  in  Section  3.1  — 
hence  we  refer  to  the  left  and  right  of  the  matrix  rather  than  the  top  and  bottom).  Using  the 
fact  that  Nk  is  “essentially”  triangular  and  assuming  that  J  is  of  full  rank,  we  can  show  that 
the  twisted  Q  factorization  (3.5.38)  is  unique,  modulo  the  signs  of  the  diagonal  elements 
of  Nk.  We  could  have  done  row  operations  instead  and  looked  at  twisted  Q  factorizations 
that  are  obtained  by  doing  QR  from  the  top  of  the  matrix,  and  QL  from  the  bottom  to 
obtain  a  singleton  in  the  &th  row.  We  have  chosen  instead  to  look  at  column  twisted  Q 
factorizations  in  order  to  make  a  direct  connection  with  the  twisted  triangular  factorizations 
of  Section  3.1. 

We  now  develop  some  theory  about  twisted  Q  factorizations  along  the  lines  of 
Section  3.1. 

Theorem  3.5.1  Let  J  be  a  nonsingular  tridiagonal  n  X  n  complex  matrix  with  the  column 
twisted  Q  factorization  at  k  given  by  ( 3.5.38 )  for  1  <  k  <  n.  Let  6k  be  the  singleton  in  the 
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kth  column  of  Nk,  i.e.,  Sk  =  Nk(k,k).  Then 


1 


=  eKjry^k. 


(3.5.39) 


Proof. 


JJ*  =  NkQ*kQkN*k 
=>  eKJryhk  =  e*kNff*  Nff1  ek. 

Since  Nkek  =  Skek ,  the  result  (3.5.39)  follows.  □ 

We  could  endeavor  to  get  expressions  for  8k,  as  we  did  for  ~fk  in  (3.1.12),  in  terms 
of  the  left  and  right  factorizations.  In  particular,  we  can  attempt  to  express  Sk  in  terms  of 
the  Schur  parameters  {cos  &k  ,k  =  1, . . . ,  n  —  1},  {cos  Of  ,k  =  1, . . . ,  n  —  1}  that  determine 
the  orthogonal  matrices  in  the  L+Q+  and  f?_Q_  factorizations,  and  the  diagonal  elements 
of  L. |_  and  f?_.  However,  we  choose  not  to  do  so  since  the  expressions  are  not  stated  simply 
with  our  present  notation,  and  do  not  add  to  our  exposition. 

The  fohowing  results  parallel  those  of  Sections  3.1  and  3.2  as  do  the  corresponding 

proofs. 

Corollary  3.5.1  Let  J  be  a  nonsingular  tridiagonal  matrix.  With  the  notation  of  Theo¬ 
rem  3.5.1, 

n  n 

^|^|-2  =  Tracea/TT1)  = 

i= 1  i=l 

where  o\,  <72, . . . ,  an  are  the  singular  values  of  J . 

Theorem  3.5.2  Let  .1  =  J—pI  be  a  tridiagonal  matrix  with  the  twisted  factorization  ( 3.5.38 ) 
for  a  fixed  value  ofk.  Then 

(J  -  fil)qk  =  8kek, 

where  qk  is  the  kth  column  of  Qk. 

Proof.  Since  Nkek  =  Skek  and  J  —  pi  =  NkQ £, 

( J  pI^)Qk  —  Nk  =7*  (J  pl^gk  —  Skek. 


□ 

Thus  the  vector  qk  is  just  a  normalized  version  of  defined  in  (3.2.25)  in  The¬ 
orem  3.2.2.  The  following  theorems  confirm  that  the  corresponding  residual  norms  are 
equal. 
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Theorem  3.5.3  Let  J  =  J  —  pi  be  a  nonsingular  tridiagonal  matrix  and  let  6k,  7 k  and 
z be  as  defined  in  (3.5.39),  (3.1.13)  and  (3.2.25)  respectively.  Then 

It*  I 


?(fc)|| 


=  141 


(3.5.40) 


Proof.  Since  (J  —  pl)z ^  =  7 kek, 


iTJfel5 


=  e*k(J  -  pI)~*{J  -  piyxek 
1 


141 


2  • 


The  last  equality  is  just  (3.5.39).  □ 

Thus  when  .1  is  normal,  the  bound  (3.2.28)  for  |7fc|/||4fc)||  applies  to  \6k\.  The 
connection  between  gk  and  z^  suggests  the  following  alternative  to  Algorithm  3.2.1  that 
is  guaranteed  to  find  a  small  6k  for  a  nearly  singular  J  —  pi,  even  when  the  eigenvalue  near 
p  is  not  isolated. 

Algorithm  3.5.1  [Computing  an  eigenvector  of  any  eigenvalue.] 

1.  Compute  the  diagonal  elements  6k  for  all  n  twisted  Q  factorizations  of  J. 

2.  Choose  the  index  r  where  |^|  is  minimum. 

3.  Form  the  vector  gr,  where  gr  is  as  in  Theorem  3.5.2. 


The  above  results  may  appear  somewhat  surprising  but  the  reader  is  reminded  of 
the  intimate  connection  between  the  LR  algorithm,  the  QR  algorithm  and  inverse  itera¬ 
tion  [110].  For  a  non-normal  matrix  also,  as  the  following  theorem  implies,  there  is  at  least 
one  6k  that  indicates  its  nearness  to  singularity. 

Theorem  3.5.4  Let  B  be  a  nonsingular  matrix  of  order  n.  Let  crmin  =  an  be  the  smallest 
singular  value  of  B  and  umin  be  the  corresponding  left  singular  vector.  If 

=  et(BB*)~lek  for  k  =  1,2, . .  ,,n 

then 

141  = 

< 


(Jr 


O min 

| | 


1  +  (\umin(k)\  2  —  l)  -^2]  5 

<  \fna,mrn  for  at  least  one  k 
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where  A2  is  a  weighted  arithmetic  mean  of  |  ^ <Trfl"  ^  ,  i  <  ra|,  0  <  A2  <  (<Jmin/ &n- i)2- 

Proof.  The  proof  is  similar  to  that  of  Theorem  3.2.3  using  the  following  eigendecomposition 
of  BB*, 

BB*  =  U^U*. 

□ 

In  Section  3.1,  we  showed  that  7^  =  0  when  .J  is  singular  and  admits  the  triangular 
decompositions  (3.1.11)  (see  Corollary  3.1.4  and  the  comments  immediately  following  it). 
Similarly,  it  can  be  shown  that  if  J  is  an  unreduced  tridiagonal  that  is  singular,  then  <%  =  0 
in  the  factorization  of  (3.5.38)  for  any  k. 

3.5.1  “Perfect”  Shifts  are  perfect 

In  this  section,  we  give  an  application  of  twisted  Q  factorizations.  If  A  is  an 
eigenvalue  of  the  unreduced  tridiagonal  J ,  then  the  last  diagonal  element  of  R ,  Rnn,  in  the 
factorization 

J  —  XI  =  QR ,  Q*Q  =  /,  R  upper  triangular 
must  be  zero  in  exact  arithmetic.  The  QR  iteration  on  J  with  shift  n  yields  .J\  : 

J  —  fil  =  QR ,  RQ  +  fil  =  J\. 

When  n  =  A,  the  (n  —  1,  n)  entry  of  the  tridiagonal  .J\  is  zero  and  hence  A  may  be  deflated 
from  J\  by  simply  removing  the  last  row  and  column. 

It  may  be  expected  that  if  n  is  close  to  A,  then  Rnn  would  be  smaU  and  deflation 
would  again  occur  in  one  iteration.  Based  on  this  expectation,  Parlett  proposed  the  following 
strategy  to  compute  all  the  eigenvalues  and  eigenvectors  of  a  real,  symmetric  tridiagonal 
matrix  T  [110,  p.173]. 

1.  Compute  eigenvalues  by  a  fast  algorithm  such  as  the  PWK  algorithm,  which  is  a  QR 
algorithm  free  of  square  roots. 

2.  Run  the  QR  algorithm  using  the  computed  eigenvalues  as  shifts,  accumulating  the 
rotations  to  form  the  eigenvector  matrix. 

The  expectation  was  that  deflation  would  occur  in  one  step,  thus  saving  approx¬ 
imately  50%  of  the  dominant  0(n3)  computation  in  the  QR  algorithm,  which  typically 
requires  an  average  of  2-3  iterations  per  eigenvalue. 
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However, 


T  -  /ul 
=>  e^{T  -  fil)~2en 

^  ^  \vt(n)\2 
U  (Ai  -  h? 

showing  that  if  the  bottom  entry  of  the  desired  eigenvector  is  tiny,  then  Rnn  is  not  small. 
As  a  result,  immediate  deflation  is  not  always  seen  in  the  perfect-shift  QR  algorithm,  and 
it  was  found  not  to  perform  any  better  than  the  standard  QR  algorithm  with  Wilkinson’s 
shift  [69].  The  situation  is  similar  to  that  with  D+(n)  discused  in  Section  3.1.  And  it  is 
natural  for  us,  as  in  Section  3.1,  to  turn  to  twisted  factorizations  to  detect  singularity  of 
T  —  fil.  Let 

T  -  fj,I  =  QkNk  (3.5.41) 

be  a  twisted  Q  factorization  of  T  —  /ul.  This  is  essentially  identical  to  the  column  twisted 
L  factorization  of  J  in  (3.5.38)  since  T  is  symmetric.  In  the  above  factorization  the  A;th 
row  of  Nk  is  a  singleton  with  a  non-zero  on  the  diagonal.  By  the  theory  developed  in  the 
previous  section,  in  particular  Theorem  3.5.4,  there  must  be  a  A;  such  that  Nk(k ,  k)  is  small 
if  fj,  is  a  good  approximation  to  an  eigenvalue  of  T. 

With  this  choice  of  A;,  we  can  form  the  matrix 

A\  =  NkQk  +  [ul  (3.5.42) 

following  the  standard  QR  algorithm.  Since  Sk  is  small  all  the  off-diagonal  entries  of  the 
kth  row  of  A\  must  be  small.  Since  A\  =  QkTQk  is  symmetric  the  off-diagonal  entries  in 
the  kth  column  are  also  tiny.  An  eigenvalue  close  to  fj,  may  be  deflated  by  crossing  out  the 
entire  kth  row  and  column  of  A\.  Thus  deflation  occurs  in  exactly  one  step  when  perfect 
shifts  are  used.  For  a  more  detailed  treatment,  the  reader  should  wait  for  [41]. 

Each  matrix  in  the  sequence  obtained  in  the  standard  QR  algorithm  is  tridiago¬ 
nal.  Does  (3.5.42)  preserve  tridiagonal  form?  The  reduction  uniqueness  theorem  in  [110, 
Section  7-2]  states  that  a  tridiagonal  formed  by  an  orthogonal  similarity  transformation, 
T  =  Q*AQ ,  is  determined  by  A  and  the  first  or  last  column  of  Q.  It  follows  that  A\  cannot 
be  tridiagonal  in  general  since,  except  when  k  =  1  or  n,  the  first  and  last  columns  of  Qk 
in  (3.5.41)  are  independent.  However,  it  can  be  shown  that  the  matrix  obtained  by  delet¬ 
ing  the  kth  row  and  column  of  A\  is  tridiagonal  with  an  additional  bulge  at  new  position 


=  QR 
l 


Lnn  | 

l 


I  Rnn  I  2 
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(k  +  1,  k)  after  deflation.  This  bulge  can  then  be  “chased”  out  of  the  top  or  bottom  of  the 
matrix,  whichever  is  nearer.  This  yields  a  tridiagonal  matrix  to  which  the  next  perfect  shift 
may  now  be  applied. 

To  form  the  eigenvector  matrix  the  n—  1  rotations  that  form  Qk  must  be  accumu¬ 
lated  as  must  be  the  rotations  used  to  chase  the  bulge  out  of  the  end  of  the  deflated  matrix. 
Thus  in  the  worst  case  1.5  iterations  are  needed  per  eigenvalue  to  compute  the  eigenvectors. 
Assuming  1-1.5  iterations  per  eigenvalue,  all  the  eigenvectors  may  be  obtained  at  a  cost  of 
3n3-4.5n3  operations.  This  is  about  twice  as  fast  as  current  QR  algorithms,  assuming  that 
n  is  large  enough  to  disregard  the  extra  computation  involved  in  the  initial  computation  of 
the  eigenvalues  and  in  finding  k  at  each  step.  Heuristics  for  guessing  k  may  also  be  used  as 
discussed  in  Section  3.4.1. 

Standard  QR  algorithms  use  an  implicit  version  to  achieve  greater  accuracy.  It 
remains  to  be  seen  if  similar  techniques  may  be  used  in  the  algorithm  outlined  above.  It 
should  also  be  possible  to  find  k  by  using  a  procedure  free  from  square  roots  as  in  the  PWK 
algorithm.  But  as  we  mentioned  earlier,  this  section  is  a  digression  from  the  main  theme  of 
this  thesis  and  we  plan  to  explore  these  questions  elsewhere  [41]. 

The  reader  may  wonder  about  our  inconsistency  in  finding  the  eigenvalues  by  a 
standard  QR  algorithm  but  using  twisted  factorizations  in  finding  the  eigenvectors.  Is  it 
not  possible  to  use  twisted  factorizations  to  find  the  eigenvalues  also?  We  believe  that 
the  latter  might  indeed  be  the  right  approach  not  only  in  speeding  up  the  algorithm  but 
also  to  get  higher  accuracy  since  the  standard  QR  algorithm  may  violently  rearrange  the 
matrix  thereby  destroying  accuracy  [32].  However,  this  is  not  straightforward  due  to  the 
complication  that  A\  in  (3.5.42)  is  not  tridiagonal.  When  deflation  does  not  occur,  chasing 
the  bulges  in  A\  off  the  top  or  bottom  of  the  matrix  will  give  the  standard  QR  or  QL 
algorithm  respectively  (by  the  reduction  uniqueness  theorem  in  [110,  Section  7-2]).  Thus 
in  order  to  develop  a  “twisted  algorithm”,  we  must  give  up  the  tridiagonal  nature  of  the 
intermediate  matrices.  The  number  of  bulges  increase  by  2  at  each  step,  and  it  is  for 
future  research  to  determine  if  it  is  computationally  feasible  to  handle  this  increase.  We 
also  believe  that  such  a  twisted  algorithm  may  lead  to  better  shifts  than  the  Francis  or 
Wilkinson  shifts. 
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3.6  Rank  Revealing  Factorizations* 

In  earlier  sections  of  this  study,  we  exhibited  twisted  triangular  and  twisted  or¬ 
thogonal  (or  Q)  factorizations  for  tri diagonals.  In  our  applications,  we  were  successfully 
able  to  reveal  the  singularity  of  the  tri  diagonal  by  one  of  the  n  possible  twisted  factors. 

In  this  section,  we  show  that  such  factorizations  can  also  be  done  for  denser  matri¬ 
ces  and  may  be  used  to  reveal  the  rank  of  the  matrix.  We  convey  our  ideas  through  pictures 
and  make  no  pretence  to  being  complete  in  this  section. 

Formally,  we  define  a  row  twisted  triangular  factorization  of  a  matrix  B  at  posi¬ 
tion  k  as  the  decomposition 

B  =  NkDkNk  (3.6.43) 

where  Dk  is  diagonal,  while  Nk  and  Nk  are  such  that  ( Nk)ij  =  0  for  i,j  such  that  j  >  i  when 
i  <  k  and  k  <  j  <  i  otherwise,  and  has  the  same  requirement  on  its  zeros.  We  take 
both  Nk  and  Nk  to  have  l’s  on  their  diagonals.  Note  that  Nk  and  Nk  are  simultaneously 
permuted  triangular  matrices,  i.e. ,  3  a  permutation  Pk  such  that  NkPk  is  lower  triangular 
and  Pk  NkPk  is  upper  triangular.  (3.6.43)  may  be  written  as 

PlBPk  =  (if  NkPk)(Pl  DkPk)(Pl  NkPk) 

and  this  is  just  the  LDU  decomposition  of  P^BPk-  This  implies  the  uniqueness  of  the 
factorization  in  (3.6.43). 

Figures  3.3  and  3.5  suggest  a  way  to  compute  the  twisted  LU  decomposition  for 
Hessenberg  and  dense  matrices  respectively.  Note  that  when  B  is  normal,  Theorem  3.2.1 
is  applicable  with  =  Dk(k,k).  Thus  when  B  is  nearly  singular  with  an  isolated  small 
eigenvalue,  Theorem  3.2.1  implies  the  existence  of  a  small  bottom  pivot  in  all  possible 
LU- factorizations  of  B  that  can  be  obtained  by  symmetric  permutations  of  the  rows  and 
columns  of  B.  A  generalization  of  such  a  result  was  proved  by  Tony  Chan  in  1984  [19]  (he 
considered  non-normal  B  as  well,  and  arbitrary  row  and  column  permutations  of  B).  We 
briefly  compare  our  results  with  Chan’s  results  of  [19]  at  the  end  of  this  section. 

We  now  formally  define  a  row  twisted  Q  decomposition  of  B  at  position  k  as 

B  =  Qk^k  (3.6.44) 

where  Qk  is  orthogonal  and  Nk  is  a  permuted  triangular  matrix  as  made  precise  earlier  in  this 
section.  Figures  3.4  and  3.5  suggest  computational  procedures  to  find  such  a  factorization 
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Figure  3.3:  Twisted  Triangular  Factorization  of  a  Hessenberg  matrix  at  k  =  3  (the  next 
elements  to  be  annihilated  are  circled) 


X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X  X 

X 

X 

xxx 

X 

X 

® 

X 

X 

X 

X 

X 

X 

X 

X 

X  X 

X 

X 

X  X 

X 

X 

X 

X 

X 

X 

® 

X 

X 

X 

X 

X 

® 

X 

® 

X 

X 

® 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

x  x  x  x  x 

x  x  x  x 

x 

x  x 
xxx 


Figure  3.4:  Twisted  Orthogonal  Factorization  of  a  Hessenberg  matrix  at  k  =  3  (the  next 
elements  to  be  annihilated  are  circled) 
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Figure  3.5:  The  above  may  be  thought  of  as  a  twisted  triangular  or  orthogonal  factorization 
of  a  dense  matrix  at  k  =  3  (the  next  elements  to  be  annihilated  are  circled) 

for  Hessenberg  and  dense  matrices  respectively.  Since  3  P k  such  that  P^N^Pk  is  upper 
triangular,  (3.6.44)  may  be  written  as 

BPk  =  QkPk(Pk  NkPk) 

to  give  a  QR  decomposition  of  B  after  a  column  permutation.  Thus  the  factorization  (3.6.44) 
is  unique  in  the  same  sense  as  the  QR  factorization  of  B  is  unique. 

The  result  of  Theorem  3.5.4  holds  for  the  decomposition  (3.6.44)  with  J  replaced 
by  B* .  This  proves  the  existence  of  a  column  permutation  of  B  such  that  the  bottom 
element  of  R  in  B' s  QR  decomposition  is  tiny  when  B  is  nearly  singular.  This  result  was 
proved  by  Chan  in  [18]  and  earlier  by  Golub,  Klema  and  Stewart  in  [66]. 

Thus  twisted  factorizations  can  be  rank-revealing.  Rank-revealing  LU  and  QR 
factorizations  have  been  extensively  studied  and  several  algorithms  to  compute  such  fac¬ 
torizations  exist.  Twisted  factorizations  seem  to  have  been  overlooked  and  may  offer  com¬ 
putational  advantages.  We  consider  our  results  outlined  above  to  be  stronger  than  those 
of  Chan  [19,  18]  and  Golub  et  al.  [66]  since  the  permutations  we  consider  are  restricted. 
In  particular,  as  seen  in  the  tridiagonal  and  Hessenberg  case,  twisted  factorizations  respect 
the  sparsity  structure  of  the  given  matrix ,  and  thus  may  offer  computational  advantages  in 
terms  of  speed  and  accuracy. 

We  believe  that  twisted  factorizations  of  Hessenberg  matrices  can  be  used  for  a 
better  solution  to  the  non-symmetric  eigenproblem.  For  banded  and  sparse  matrices,  twisted 
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factorizations  may  offer  considerable  savings  in  computing  rank-revealing  factorizations.  We 
intend  to  conduct  further  research  in  these  two  areas.  The  innovative  reader  may  be  able 
to  think  of  other  applications. 
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Chapter  4 

Computing  orthogonal 
eigenvectors  when  relative  gaps 
are  large 


In  the  last  chapter,  we  showed  how  to  compute  the  eigenvector  corresponding  to 
an  isolated  eigenvalue  of  a  normal,  unreduced  tridiagonal  matrix.  In  particular,  we  showed 
how  to  choose  a  right  hand  side  for  inverse  iteration  that  has  a  guaranteed  large  component 
in  the  direction  of  the  desired  eigenvector.  Even  though  this  is  both  a  theoretical  and 
practical  advance,  it  does  not  solve  the  most  pressing  problem  with  inverse  iteration  — 
that  of  computing  approximate  eigenvectors  that  are  numerically  orthogonal. 

When  eigenvalues  are  well  separated,  vectors  that  give  a  small  residual  norm, 
as  in  (1.1.1),  are  numerically  orthogonal.  In  such  a  case,  the  methods  of  Chapter  3  are 
sufficient.  However  this  is  not  the  case  with  close  eigenvalues  and  current  implementations 
of  inverse  iteration  resort  to  explicit  orthogonalization.  As  mentioned  in  Chapter  1  and 
Section  2.8.1  this  can  take  an  inordinately  large  amount  of  computation,  and  even  then, 
the  computed  vectors  may  not  be  numerically  orthogonal. 

The  rest  of  this  thesis  is  devoted  to  the  computation  of  “good  eigenvectors”  that  are 
automatically  numerically  orthogonal  thus  avoiding  the  need  for  explicit  orthogonalization. 
In  this  chapter,  we  show  that  an  alternate  representation  of  a  tridiagonal  is  the  key  to  better 
approximations.  Coupled  with  the  theory  developed  in  Chapter  3  this  allows  us  to  compute 
orthogonal  eigenvectors  when  the  corresponding  eigenvalues  have  large  relative  gaps,  even 
though  the  eigenvalues  may  be  very  close  together  in  an  absolute  sense.  More  precisely, 
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1.  In  Section  4.1,  we  extol  the  virtues  of  high  accuracy.  Tridiagonals  do  not  always 
“allow”  such  high  accuracy  computations  and  in  Section  4.2,  we  advocate  representing 
the  tri diagonal  as  a  product  of  bidiagonal  matrices. 

2.  In  Section  4.3,  we  recap  earlier  work  on  relative  perturbation  theory  as  applied  to 
bidiagonal  matrices. 

3.  In  Section  4.4.1,  we  give  qd-like  recurrences  that  allow  us  to  exploit  the  properties  of 
bidiagonals  in  order  to  compute  highly  accurate  approximations  to  eigenvectors.  In 
Section  4.4.2,  we  do  a  roundoff  error  analysis  of  their  computer  implementations  while 
in  Section  4.4.3  we  give  an  algorithm  to  compute  eigenvectors  based  on  these  qd-like 
recurrences. 

4.  In  Section  4.5  we  prove  that  the  dot  products  between  the  eigenvectors  computed  by 
the  algorithm  given  in  Section  4.4.3  are  inversely  proportional  to  the  relative  gaps 
between  eigenvalues.  As  a  consequence,  the  computed  eigenvectors  are  numerically 
orthogonal  if  the  corresponding  eigenvalues  are  relatively  far  apart.  These  results 
are  new  and  are  a  major  advance  towards  our  goal  of  obtaining  guaranteed  numerical 
orthogonality  in  0(n2)  time. 

5.  In  Section  4.6,  we  present  numerical  results  that  support  the  above  claims. 

We  consider  the  case  of  eigenvalues  that  have  small  relative  gaps  in  the  next  two  chapters. 
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V2  —  (To  —  X2I)  1b2 


V2  I  Vi  A2  -  A2 
A2  —  A2  1 1]2  Ai  —  A2 


V\  +  u2  + 


??3  A2  —  A2 

A?2  A3  —  A2 


(4.1.2) 


We  assume  that  £1,  £2,  771  and  ??2  are  0(1).  Earlier  we  showed  that  y\  and  y2  are 
nearly  parallel  if  Ai  ~  A2,  and  in  such  a  case,  EISPACK  and  LAPACK  implementations  fail 
to  deliver  orthogonal  vectors  despite  explicit  orthogonalization.  See  Section  2.8.1  and  Case 
Study  A  for  more  details.  It  is  clearly  desirable  that  Ai  and  A2  be  more  accurate  so  that 
their  difference  is  discernible.  Given  limited  precision  to  represent  numbers  in  a  computer, 
how  accurate  can  Ai  and  A2  be?  We  say  that  \  agrees  with  \  to  d  digits  if 


I  Aj’  A  * 


10“d. 


The  IEEE  double  precision  format  allows  for  53  bits  of  precision,  thus  e  =  2-52  ~ 
2.2  •  10-16  and  d  can  take  a  maximum  value  of  about  16.  Suppose  Ai  and  A2  agree  with  Ai 
and  A2  respectively  to  d  digits,  where  d  >  1.  Since  Ai  and  A2  do  not  agree  in  any  digit, 


— - - T—  =  0(10  d)  where  i  =  1,2,  j  /  i 

|Aj  -  A*  | 

and  by  (4.1.1)  and  (4.1.2), 

„  „  =  o  (IhvAi  +  AxA)  =  ofio-y 

1 1 2/1 1 1  •  1 1 2/2 1 1  V  lA2  -  All  I  Ax  -  A2|  y 

Thus  the  more  accurate  Ai  and  A2  are,  the  more  orthogonal  are  y\  and  y2.  In  fact  when 
d  =  16,  i.e. ,  when  Ai  and  A2  have  full  relative  accuracy,  y\  and  y2  are  numerically  orthogonal 
and  no  further  orthogonalization  is  needed. 

Of  course,  the  above  is  true  only  in  exact  arithmetic.  We  now  see  why  the  standard 
representation  of  a  tridiagonal  matrix  does  not  allow  computation  of  eigenvalues  to  such 
high  accuracy. 


4.2  Tridiagonals  Are  Inadequate 

A  real,  symmetric  tridiagonal  matrix  T  is  traditionally  represented  by  its  2n  —  1 
diagonal  and  off-diagonal  elements.  In  this  section,  we  show  that  for  our  computational 
purposes  it  is  better  to  represent  T  by  its  bidiagonal  factors. 

To  account  for  the  roundoff  errors  that  occur  in  a  computer  implementation,  it  is 
common  practice  to  show  that  the  computed  eigenvalues  and  eigenvectors  are  exact  for  a 
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slightly  perturbed  matrix.  For  highly  accurate  algorithms  such  as  bisection  we  can  show 
that  the  eigenvalues  computed  for  T  are  exact  eigenvalues  of  T  +  ST  where  ST  represents  a 
small  componentwise  perturbation  in  only  the  off-diagonals  of  T.  In  the  following  example, 
we  show  that  such  a  perturbation  can  cause  a  large  relative  change  in  the  eigenvalues  and 
eigenvector  entries. 


Example  4.2.1  [Tridiagonals  are  inadequate.]  Consider  the  tridiagonal 

1  -  Ve  e^V1  -  7e/4  0 

T\  =  e^Vl  -  7e/4  v^+7e/4  e/4 

0  e/4  3e/4 

and  a  small  relative  perturbation 

1  -  y/e  e1/,4(l  +  e)\/l  -  7e/4  0 

T\  +  STi  =  e1/,4(l  +  e)^l  -  7e/4  +  7e/4  e(l  +  e)/4 

0  e(l  +  e)/4  3e/4 

where  e  is  the  machine  precision.  The  two  smallest  eigenvalues  of  T\  and  T\  +  ST\  are1: 


Ai  —  e/2  +  e3/,2/8  +  0(e2), 

Ax  +  ^Ax  =  e/2-7e3/2/8  +  0(e2) 

and 

e  -e3/2/8  +  0(e2), 
e  -  9e3/2/8  +  0(e2). 

O(v^),  *  =  1,2 

and  the  relative  change  in  these  eigenvalues  is  much  larger  than  the  initial  relative  pertur¬ 
bations  in  the  entries  of  T\.  Similarly  the  corresponding  eigenvectors  of  Ti  and  Ti  +  ST\ 
are: 


\Zfh  +  ¥  + o(W4) 

'  /§(l  +  5#)  +  0(W4)  ■ 

Vi  = 

-At1-#) +  0V) 

m  +  = 

-75(1 +  ¥)  + 01-0 

J_M  _  3e]  0(e3/2) 

L  y^4  4)  +  u\£  )  J 

^75(1  “  2V^)  +  0(e) 

Thus 


A2  +  S  A2 


S  A,- 


'we  artfully  constructed  this  matrix  to  have  the  desired  behavior  which  may  be  verified  by  using  a  symbol 
manipulator  such  as  Maple  [21]  or  Mathematica  [137] 
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and 


4  4b  +  o(W4) 

f £  / 

V 2 

1_3^)  +  0(e5/4)  ' 

V2  = 

^_(i_^)  +  0(E) 

,  v2  +  Sv2  = 

751 

:i  -s#)  +  o(£) 

.  *a  +  f)  +  o(V/2)  . 

-  75( 

1  +  2  a/s)  +  0(e) 

whereby 


=  0( y/e)  for  7  =  1,2  and  j  =  1,2,3. 

Since  a  small  relative  change  of  e  in  the  off-diagonal  entries  of  T\  results  in  a  much 
larger  relative  change  in  its  eigenvalues  and  eigenvectors,  we  say  that  T\  does  not  deter¬ 
mine  its  eigenvalues  and  eigenvector  components  to  high  relative  accuracy.  Consequently, 
it  is  unlikely  that  we  can  compute  numerically  orthogonal  eigenvectors  without  explicit  or- 
thogonahzation.  To  confirm  this,  we  turned  off  all  orthogonalization  in  the  EISPACK  and 
LAPACK  implementations  of  inverse  iteration  and  found  the  computed  vectors  to  have  dot 
products  as  large  as  0(y/e).  For  more  details,  see  Case  Study  B.  □ 


The  situation  can  be  retrieved  by  computing  the  bidiagonal  Cholesky  factor  of  Ti, 

Ti  =  hLj. 

It  is  now  known  that  small  relative  changes  to  the  entries  of  a  bidiagonal  matrix  cause  small 
relative  changes  to  its  singular  values  (note  that  the  singular  values  of  L\  are  the  square 
roots  of  the  eigenvalues  of  T\).  When  relative  gaps  between  the  singular  values  of  L\  are 
large,  it  is  also  known  that  the  changes  in  the  corresponding  singular  vectors  are  small. 
In  the  rest  of  this  chapter,  we  show  how  to  exploit  this  property  of  bidiagonal  matrices  to 
compute  numerically  orthogonal  eigenvectors  without  any  explicit  orthogonalization. 

The  relative  perturbation  results  for  bidiagonal  matrices  mentioned  above  have 
appeared  in  [89,  29,  35,  49,  50,  51,  100,  101].  We  state  the  precise  results  in  the  next 
section. 


4.3  Relative  Perturbation  Theory  for  Bidiagonals 

In  1966,  Kahan  proved  the  remarkable  result  that  a  real,  symmetric  tridiagonal 
with  zero  entries  on  the  diagonal  determines  its  eigenvalues  to  high  relative  accuracy  with 
respect  to  small  relative  perturbations  in  its  off-diagonal  elements.  This  result  appears  to 
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have  lain  neglected  in  the  technical  report  [89]  until  Demmel  and  Kahan  used  it  in  1990 
to  devise  a  method  to  compute  the  singular  values  of  a  bidiagonal  matrix  to  high  relative 
accuracy  [35].  Subsequently,  these  results  have  been  extended  and  simplified,  and  we  cite 
some  contributions  during  the  course  of  this  section. 

Consider  the  lower  bidiagonal  matrix 


ai  0 

bi  a2 
b2  a3 

0  bn— i  cin 


and  a  componentwise  perturbation 


L  +  SL 


Qj\ Ot\ 

b\a2  a2a3 

b2a4  a3a3 


0 


0 


bn  —  1  ^2n  —  2  ^n^2n  —  1 


where  ay  >  0. 

The  key  fact  is  that  L  +  8 L  may  be  written  as 


(4.3.3) 


(4.3.4) 


L  +  6L  =  D4LD2 

/  a2  a2a4  a2a4a3  a2a4a3  ■  ■  ■  a2n_2 

where  L)\  —  dicig  (  1,  — , - , - , . , - 

V  oq  a4a3  a4a3a3  a4a3a3  ■  ■  ■  a2n_3 

(  a4a3  a4a3a3  a4a3a3  ■  ■  ■  a2n_4 

and  U2  =  diag  «i, - , - , . , - 

V  ol2  ol2ol4  ol2ol4  •  •  •  Oi2n—2 

In  [49],  Eisenstat  and  Ipsen  considered  such  multiplicative  perturbations  for  sin¬ 
gular  value  problems  and  proved  the  results  presented  below  which  show  that  if  D\  and 
D2  are  close  to  unitary  matrices  then  the  singular  values  of  L  +  6L  are  close  in  a  relative 
measure  to  the  corresponding  singular  values  of  L.  These  results  are  also  an  immediate 
consequence  of  Ostrowski’s  theorem  in  [80,  Thm.  4.5.9].  In  the  following,  Gj  and  Gj  +  6a j 
denote  the  jth  singular  values  of  L  and  L  +  6 L  respectively,  while  Uj ,  Uj  +  Siij ,  Vj  and 
Vj  +  Svj  denote  the  corresponding  left  and  right  singular  vectors. 
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Theorem  4.3.1  (Eisenstat  and  Ipsen  [49,  Thm.  3.1]).  Let  L  +  SL  =  DJLD2,  where  D\ 
and  D2  are  nonsingular  matrices.  Then 

Oj 

- r - —  <  (Jr,  T  0(7  j  <  (Jr-,  iDl  •  LDo  . 

n^r  11  •  11^2“  11  ~ 

Corollary  4.3.1  (Barlow  and  Demmel  [9,  Thm.  1],  Deift  et  al.  [29,  Thm.  2.12],  Demmel 
and  Kahan  [35,  Cor.  2],  Eisenstat  and  Ipsen  [49,  Cor.  4.2]).  Let  L  and  L-\-6L  be  bidiagonal 
matrices  as  in  (4-3.3)  and  (4-3.4)-  Then 

^  aJ  +  SaJ  <  (1  +  h)^j, 
where  r/  =  n^i”1  niax{|oy|,  l/|ay|}  —  1. 

Proof.  The  proof  follows  by  noting  that  ||_Di||  •  \\D2  ||  <  1  +  7?  and  ||i41 1||  •  || D2  1||  <  1  +  7/, 
and  then  applying  Theorem  4.3.1.  □ 

More  recently,  in  [116]  Parlett  gives  relative  condition  numbers  that  indicate  the 
precise  amount  by  which  a  singular  value  changes  due  to  a  relative  perturbation  in  a  par¬ 
ticular  element  of  a  bidiagonal  matrix. 


Theorem  4.3.2  (Parlett  [116,  Thm.  1])  Let  L  be  a  bidiagonal  matrix  as  in  (4-3.3),  with 
ai  zfz  0,  bi  7^  0.  Let  a  denote  a  particular  singular  value  of  L  and  let  u,  v  be  the  corresponding 
singular  vectors.  Then,  since  a  0, 


da  ak 
dak  a 


( b ) 


da  h 
dbk  a 


k  k— 1  n  n 

X>(02  -  =  J2  M(m)2  -  J2 

2  =  1  j  =  1  m  =  k  l=k+ 1 

k  n 

^ ~2(u(i )2  —  v(i )2)  =  ( v(m )2  —  u(m)2). 

2  =  1  77l  =  /c  +  l 


Traditional  error  bounds  on  singular  vector  perturbations  show  them  to  be  in¬ 
versely  proportional  to  the  absolute  gaps  between  the  corresponding  singular  values.  Recent 
work  has  shown  that  in  the  case  of  multiplicative  perturbations,  as  in  Theorem  4.3.1  above, 
absolute  gaps  may  be  replaced  by  relative  gaps. 

Before  we  quote  the  relevant  results,  we  introduce  some  notation.  We  define  the 
relative  distance  between  two  numbers  a  and  (3  as 

reldist(a,  f3)  =f  - — : — 


(4.3.5) 
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By  the  above  definition,  reldist(a,  (3)  7^  reldist(/3,  a).  On  the  other  hand,  the  measures 

reldist„(a,  j3)  =f  — J  ==.  (4.3.6) 

f/\a\p  +  w 

are  symmetric  for  1  <  p  <  00.  Note  that 

j  I  rv  —  R\ 

(4.3.7) 


reldist00(o!,  fi)  =f  ^  ^ 


max(|a|,  |/3|) 

For  more  discussion  and  comparison  between  these  measures,  see  [100].  Relative  gaps 
between  singular  values  will  figure  prominently  in  our  discussion,  and  we  define 


relgap(//,  {ct})  min  reldist(z/,  y),  (4.3.8) 

ye  M 

where  v  is  a  real  number  while  {<7}  denotes  a  set  of  real  numbers.  Typically,  {<7}  will  denote 
a  subset  of  the  singular  values.  Similarly,  we  define  the  relative  gaps  for  the  symmetric 
measures  to  be 

relgap  (z/,  {<7})  ^  min  reldistp(z/,  y). 

ye- M 

The  following  theorem  bounds  the  perturbation  angle  in  terms  of  the  relative  gaps. 
It  is  a  special  case  of  Li’s  Theorem  4.7  [101]. 


Theorem  4.3.3  Let  L  +  SL  =  Dj LD 2,  where  D\  and  D2  are  nonsingular  matrices.  If 
Gi  +  7^  Gj  for  i  7^  j ,  then 


\s’m  L(uj,Uj  +  Suf)\  < 


1 1  -  DT1 II2  +  ¥  -  Df'W  +  ¥  -  dT2  II2  +  ¥  -  D21 1 

relgap 2 (o-j,  {g,  +  SGt\i  ±  j}) 


Corollary  4.3.2  Let  L  and  L  +  6L  be  bidiagonal  matrices  as  in  (4.3.3)  and  (4.3.4).  If 
Gi  +  6g{  7^  Gj  for  i  7^  j ,  then 


sin  I(uj,  Uj  +  Suj 


_ V _ 

relgap 2 (o-j,  {g,  +  SGt\i  ±  j})’ 


where  rj  =  Y\y=i 1  max{|aj-|,  l/|ay|}  —  1. 


See  also  Eisenstat  and  Ipsen  [49,  Theorem  3.3  and  Cor.  4.5]. 

The  above  results  can  be  generalized  to  singular  subspaces  of  larger  dimension. 
Before  we  present  the  results,  we  need  to  introduce  additional  notation.  We  partition  the 
singular  value  decomposition  of  the  square  matrices  L  and  L  +  8 L  as 


L  =  UHVT  =  (Ui,  U2) 
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and 

Si+^1  o  W  V?  +  6V ?  \ 

0  s2  +  SZ2  j  V  V2  +  ^2  )  ’ 

where  Si  =  diag(oi, . . . ,  o^)  and  S2  =  diag(<7fc+i  +  Sgk+i,  ■  ■  ■  ?  °n  +  k>Gn)  for  1  <  k  <  n,  and 
the  other  arrays  are  partitioned  conformally.  In  such  a  case,  we  measure  the  angle  between 
the  eigenvector  Uj  and  the  invariant  subspace  U\  +  SUi,  j  <  k,  as 

||  sin  L(v,j,  U\  +  SUi)\\  =  \\{U2  +  bU2)u:j\\. 

For  more  on  angles  between  subspaces,  see  [67,  Section  12.4.3].  The  following  is  an  easily 
obtained  extension  of  Theorem  4.8  in  [101]. 

Theorem  4.3.4  Let  L  +  SL  =  DjLD 2,  where  D\  and  D2  are  nonsingular  matrices.  If 

max  Gi  +  <  min  07, 

&  <  7  <  71  l<l<k 

then  for  j  <  k  and  1  <  p  <  00, 

max  |  ^||/-jD-1||2  +  ||/-jD^||2,  ^||/ - Df1 1|2  +  \\I  -  i^||2} 
relgapp((Tj,  {at  +  Sat\k  <  i  <  n}) 

Corollary  4.3.3  Let  L  and  L  +  SL  be  bidiagonal  matrices  as  in  (4-3.3)  and  (4-3.4).  If 


sin  I(Uj,  U\  +  SUi)\\  < 


L  +  SL  =  (Ui  +  SU^Ui  +  SUi) 


max  Gi  +  bo{  <  min  gi, 
k<Ci<n  l<l<k 


then  for  j  <  k  and  1  <  p  <  00, 

||  sin  L(v,j,  U\  +  SUi)\\  < 

where  g  =  n^i”1  niax{|oy|,  l/|oy|}  —  1. 


relgapp(<7j,  {07  +  bGt\k  <  i  <  n})  ’ 


See  also  Eisenstat  and  Ipsen  [50,  Theorem  3.1]. 


4.4  Using  Products  of  Bidiagonals 


We  now  show  how  to  exploit  the  properties  of  a  bidiagonal  matrix  that  were  out¬ 
lined  in  the  previous  section.  Consider  the  tridiagonal  matrix  Ti  given  in  Example  4.2.1.  In 
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section  4.2,  we  gave  the  inherent  limitations  of  using  the  2n—  1  diagonal  and  off-diagonal  el¬ 
ements  of  T\.  Since  T\  is  positive  definite  we  can  compute  its  bidiagonal  Cholesky  factor  L\. 
The  singular  values,  <7j,  of  L\  may  now  be  computed  to  high  relative  accuracy  using  either 
bisection  or  the  much  faster  and  more  elegant  dqds  algorithm  given  in  [56]  (remember  that 
in  exact  arithmetic  the  eigenvalues  of  T\  are  the  squares  of  the  singular  values  of  L\  and 
the  eigenvectors  of  T\  are  the  left  singular  vectors  of  L\).  Recall  that  the  singular  values  of 
L\  are  such  that 


uf  =  e/2  +  0(e2),  p  =  £  +  0(e2),  and  erf  =  1  +  0(e). 

As  a  consequence  of  the  large  relative  gaps  between  the  singular  values,  the  singular  vectors 
of  L i  are  “well-determined”  with  respect  to  small  componentwise  perturbations  in  entries 
of  L\.  We  can  now  compute  these  singular  vectors  by  using  a  method  similar  to  Algo¬ 
rithm  3.2.1  of  Chapter  3.  In  spite  of  being  computed  independently,  these  vectors  will  turn 
out  to  be  numerically  orthogonal! 

The  errors  made  in  computing  the  Cholesky  factor  L\  are  irrelevant  to  the  orthog¬ 
onality  of  the  computed  vectors.  Cholesky  factorization  is  known  to  be  backward  stable 

and  it  can  be  shown  that  the  computed  L\  is  the  exact  Cholesky  factor  of  T\  +  8T\  where 

~  ~  T 

||^Ti||  =  0(e||Ti||).  Thus  small  residual  norms  with  respect  to  L\L\  translate  to  small 
residual  norms  with  respect  to  Ti,  i.e. , 


WiU'Ll  -  op)v\\ 

=  OieWU'LlW) 

\m-ap)v\\ 

=  OieWTr  ||). 

The  dual  goals  of  orthogonality  and  small  residual  norms  will  thus  be  satisfied  in  this  case 
(see  (1.1.1)  and  (1.1.2)). 

The  tridiagonal  matrix  of  Example  4.2.1  is  positive  definite.  In  general,  the  matrix 
T  whose  eigendecomposition  is  to  be  computed  will  be  indefinite.  In  such  a  case,  we  modify 
slightly  the  strategy  outlined  in  the  above  paragraph  by  shifting  T  to  make  it  definite.  We 
can  apply  this  transformation  since  the  eigenvectors  of  any  matrix  are  shift  invariant,  i.e., 

Eigenvectors  of  T  =  Eigenvectors  of  T  +  for  all  fi. 


Our  strategy  can  be  summarized  by  the  following  algorithm. 
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Algorithm  4.4.1 


[Computes  eigenvectors  using  bidiagonals  (preliminary  version).] 


1.  Find  fj,  <  ||T||  such  that  T  -\-  fil  is  positive  (or  negative)  definite. 

2.  Compute  T  +  fil  =  LLT . 

3.  Compute  the  singular  values  of  L  to  high  relative  accuracy  (by  bisection  or  the  dqds 
algorithm). 

4.  For  each  computed  singular  value  of  L ,  compute  its  left  singular  vector  by  a  method 

similar  to  Algorithm  3.2.1  of  Chapter  3.  □ 

We  now  complete  Step  4  of  the  above  algorithm  that  computes  an  individual 
singular  vector  of  L.  We  need  to  implement  this  step  with  care  in  order  to  get  guaranteed 
orthogonality,  whenever  possible.  Note  that  each  singular  vector  is  computed  independently 
of  the  others.  Corollary  4.3.2  implies  that  such  vectors  do  exist  when  the  singular  values 
have  large  relative  gaps  and  when  the  effects  of  roundoff  can  be  attributed  to  small  relative 
changes  in  the  entries  of  L. 

4.4.1  qd-like  Recurrences 

Before  we  proceed  to  the  main  content  of  this  section,  we  make  a  slight  change  in 
our  representation.  Instead  of  the  Cholesky  factorization  LLT  of  T  +  fil,  we  will  consider 
its  triangular  decomposition  LDL1 ,  where  L  is  unit  lower  bidiagonal  of  the  form 

1  0 
h  1 


L  0  ln-i  1 

and 

D  =  di&g(d1,d2,...,dn-1,dn).  (4.4.10) 

Both  factorizations  are  obviously  linked  and  L  =  LD1/2.  The  following  theorem  indicates 
that  in  terms  of  the  relative  perturbation  theory  presented  in  Section  4.3,  both  these  rep¬ 
resentations  are  equivalent. 


(4.4.9) 
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Theorem  4.4.1  Let  LDLT  =  LLlL1 ,  where  L,  D  and  L  are  as  in  (4-4-9),  (4-4-10)  and 
(4-3.3)  respectively,  while  1)  is  a  diagonal  matrix  with  ±1  entries  on  its  diagonal.  Let 
A  =  sign(A)<r2  (^  0)  be  a  typical  eigenvalue  of  LDLT .  Then 


d(J  h 
dbk  a  '' 


Proof.  Let  ( Ll)kk  =  ^k-  The  two  factorizations  are  related  by 

3-k  —  ^kak,  lk  —  hk)  ak- 


(a) 

OX 

dk 

Oa 

0dk 

T 

0ak 

a 

(b) 

OX 

1 k 

=  2^- 

bk 

dfk 

*x 

dbk 

a 

(4.4.11) 

(4.4.12) 


By  applying  the  chain  rule  for  derivatives, 


OX 

OX 

0dk 

OX 

dh 

(4.4.13) 

0ak 

0dk 

dak  dlk 

dak ’ 

and 

OX 

OX 

0dk 

OX 

Oh 

(4.4.14) 

dbk 

0dk 

dbk  0lk 

dbk' 

By  substituting 

OX 

Ox 

=  2(7  sign(A) 

Oa 

Ox  ’ 

1 

—  2(jJkak, 

Oik  bk  ,  0lk  1 

"a  ~  ’  SjHCL 

0ak  azk  dbk  ak 


in  (4.4.13)  and  (4.4.14),  we  get 


2™gn(A)|l 


OX  0  OX  bk 

— —  •  ZUkClk  -  -ttt  ■  — , 

0dk  0lk  a% 


and 


,0a 

2<.s,gn(A)  — 


OX  1 
Oik 


(4.4.15) 

(4.4.16) 


The  result  (4.4.12)  now  follows  from  multiplying  (4.4.16)  by  bk) A,  while  (4.4.11)  is  similarly 
obtained  by  substituting  (4.4.12)  in  (4.4.15).  □ 

From  now  on,  we  will  deal  exclusively  with  the  LDLT  representation  instead  of 
the  Cholesky  factorization.  This  choice  avoids  the  need  to  take  square  roots  when  forming 
the  Cholesky  factor. 

To  find  an  individual  eigenvector  by  Algorithm  3.2.1,  we  needed  to  form  the 
L+D+L1),  and  U-D_U (0  decompositions  of  T  —  XL.  Instead  of  T  we  now  have  the  fac¬ 
tored  matrix  LDLT .  Algorithm  4.4.2  listed  below  implements  the  transformation 


LDLt  -  pL  =  L+D+Lt+. 


(4.4.17) 


80 


We  call  this  the  “stationary  quotient-difference  with  shift”  (stqds)  transformation  for  his¬ 
torical  reasons.  This  term  was  first  coined  by  Rutishauser  for  similar  transformations  that 
formed  the  basis  of  his  qd  algorithm  first  developed  in  1954  [121,  122,  123].  Although  (4.4.17) 
is  not  identical  to  the  stationary  transformation  given  by  Rutishauser,  the  differences  are 
not  significant  enough  to  warrant  inventing  new  terminology.  More  recently,  Fernando  and 
Parlett  have  developed  another  qd  algorithm  that  gives  a  fast  way  of  computing  the  singular 
values  of  a  bidiagonal  matrix  to  high  relative  accuracy  [56].  The  term  ‘stationary’  is  used 
for  (4.4.17)  since  it  represents  an  identity  transformation  when  fj,  =  0.  Rutishauser  used 
the  term  ‘progressive’  instead  for  the  formation  of  U-D_Uf  from  LDL1 . 

In  the  rest  of  this  chapter,  we  will  denote  L+(i  +  l,i)  by  L+(i),  +  1)  by 

U-(i )  and  the  ith  diagonal  entries  of  D+  and  D_  by  D+(i)  and  D_(i )  respectively. 

Algorithm  4.4.2  (stqds) 

D. |_(1)  :=  d\  —  n 
for  i  =  1 ,  n  —  1 

L+(i)  :=  ( dtli)/D+(i ) 

-Z4_|_(2  -(-1)  . —  d^f  T  d^\  T-j-  (2 )d?'/?'  fi 

end  for 

We  now  see  how  to  eliminate  some  of  the  additions  and  subtractions  from  the 
above  algorithm.  We  introduce  the  intermediate  variable 


(4.4.18) 

(4.4.19) 


si+i  —  D+(i  +  1)  —  di+1,  (4.4.20) 

=  dilf  -  L+(i)dili  -  fi,  by  (4.4.19) 

=  L+(i)U(D+(i)  -  d{)  -  fj,,  by  (4.4.18) 

=  L+(i)liSi  -  fi.  (4.4.21) 

Using  this  intermediate  variable,  we  get  the  so-called  differential  form  of  the  sta¬ 
tionary  qd  transformation  (dstqds).  This  term  was  again  coined  by  Rutishauser  in  the 
context  of  similar  transformations  in  [121,  122]. 

Algorithm  4.4.3  (dstqds) 


si  :=  -ji 
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for  i  =  1 ,  n  —  1 

D+{i)  ■=  Si  +  di 

L+(i)  :=  ( dtlt)/D+(i ) 

fi 

end  for 

^+(^0  * —  ^ n  T  dn 

In  the  next  section  we  will  show  that  the  above  differential  algorithm  has  some 
nice  properties  in  the  face  of  roundoff  errors. 

We  also  need  to  compute  the  transformation 

LDLT  -  III  =  U-D-Ul. 

which  we  call  the  “progressive  quotient-difference  with  shift”  (qds)  transformation.  The 
following  algorithm  gives  an  obvious  way  to  implement  this  transformation. 

Algorithm  4.4.4  (qds) 

U-{n)  :=  0 

for  i  =  n  —  1,1,— 1 

D-(i  +  1)  :=  dJi  +  d8_|_i  —  U-(i  +  l)d;+i^'+i  —  /i  (4.4.22) 

U-(i)  :=  (dtli)/D_(i  +  1)  (4.4.23) 

end  for 

D-(l)  :=  di  —  U-(l)d\l\  —  fi 


As  in  the  stationary  transformation,  we  introduce  the  intermediate  variable 

Pi  —  ^i  —  l^i  —  1^ 

=  di  -  U-{i)dili  -  (i,  by  (4.4.22) 

=  d  (i  +  l)^D~^  +  X)  “  “  V’  by  (4-4-23) 


(4.4.24) 


(4.4.25) 


Using  this  intermediate  variable,  we  get  the  differential  form  of  the  progressive  qd  transfor¬ 
mation, 
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Algorithm  4.4.5  (dqds) 


Pn  • —  dn  p 
for  i  =  n  —  1,1,— 1 

D-(i  +  1)  :=  dilf  +  Pi+\ 
t  :=  di/D-(i-\-  1) 

U-(i)  :=  ht 


pt  :=  pi+1t  -  p 
end  for 


D-(  1)  :=  Pi 


Note  that  we  have  denoted  the  intermediate  variables  by  the  symbols  and  pi  to 
stand  for  stationary  and  progressive  respectively. 

As  in  Algorithm  3.2.1,  we  also  need  to  find  all  the  y^’s  in  order  to  choose  the 
appropriate  twisted  factorization  for  computing  the  eigenvector.  Since  (LDLT)k}k- i-i  =  dkh, 
by  the  fourth  formula  for  y^  in  Corollary  3.1.1,  we  have 

(dkh) 


y k  =  D+(k )  - 

=  sk  +  dk  — 
=  sk  + 


D_(k  +  iy 

(dkh)2 
D-(k  +  iy 
dk 


by  (4.4.20) 


D_(k+  1) 

By  (4.4.24),  (4.4.25)  and  (4.4.21),  we  can  express  yj,  as 


D_(k  +  1)  —  • 


Ik 


\  Sk  +  D-(k+ 1)  '  Pk+1, 
1  sk  +  Pk  +  Pi 


(4.4.26) 


[  pk  +  L+(k  -  l)h-iSk-i- 

In  the  next  section,  we  will  see  that  the  top  and  bottom  formulae  in  (4.4.26)  are 
“better”  for  computational  purposes.  We  can  now  choose  r  as  the  index  where  |y^|  is 
minimum.  The  twisted  factorization  at  position  r  is  given  by 

LDLt  -  pi  =  NrDrNj , 


where  Dr  =  diag(_D+(l), . . . ,  D+(r  —  1),  yr,  D_(r  + 1), . . . ,  D_(n))  and  Nr  is  the  correspond¬ 
ing  twisted  factor  (see  (3.1.10)).  It  may  be  formed  by  the  following  “differential  twisted 
quotient-difference  with  shift”  (dtwqds)  transformation. 


83 


Algorithm  4.4.6  (dtwqds) 


si  :=  -p 

for  i  =  1 ,  r  —  1 

D+{i)  ■=  Si  +  di 

L+(i)  :=  ( dtli)/D+(i ) 

Sj_|_i  . —  L+(i)liSi  p 

end  for 


Pn  • —  dn  p 
for  i  =  n  —  1,  r,  —  1 

D-(i  +  1)  :=  dilf  +  pi + 1 

t  :=  dj/_D_(i+l) 

17_(i)  :=  lit 


pt  :=  pi+1t  -  p 


end  for 

T  ^ 


dr 

D_(r+  1)  ' Pr+1 


Note:  In  cases  where  we  have  already  computed  the  stationary  and  progressive  transforma¬ 
tions,  i.e.,  we  have  computed  L+,  D+,  U-  and  D_,  the  only  additional  work  needed  for 
dtwqds  is  one  multiplication  and  one  addition  to  compute  ^r. 

In  the  next  section,  we  exhibit  desirable  properties  of  the  differential  forms  of 
our  qd-like  transformations  in  the  face  of  roundoff  errors.  Before  we  do  so,  we  emphasize 
that  the  particular  qd-like  transformations  presented  in  this  section  are  new.  Similar  qd 
recurrences  have  been  studied  by  Rutishauser  [121,  122,  123],  Henrici  [76],  Fernando  and 
Parlett  [56],  Yao  Yang  [138]  and  David  Day  [28]. 


4.4.2  Roundoff  Error  Analysis 

First,  we  introduce  our  model  of  arithmetic.  We  assume  that  the  floating  point 
result  of  a  basic  arithmetic  operation  o  satisfies 


fl(x  o  y)  =  (x  o  y)(  1  +  rj)  =  (x  o  y)/(l  +  6) 
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where  r]  and  8  depend  on  x,  y,  o,  and  the  arithmetic  unit  but  satisfy 

\r]\  <  e,  |tf|  <  e 

for  a  given  e  that  depends  only  on  the  arithmetic  unit.  We  shall  choose  freely  the  form  ( r / 
or  8)  that  suits  the  analysis.  As  usual,  we  will  ignore  0(s2)  terms  in  our  analyses.  We  also 
adopt  the  convention  of  denoting  the  computed  value  of  x  by  x. 

Ideally,  we  would  like  to  show  that  the  differential  qd  transformations  introduced 
in  the  previous  section  produce  an  output  that  is  exact  for  data  that  is  very  close  to  the  input 
matrix.  Since  we  desire  relative  accuracy,  we  would  like  this  backward  error  to  be  relative. 
However,  our  algorithms  do  not  admit  such  a  pure  backward  analysis  (see  [138,  111]  for  a 
backward  analysis  where  the  backward  errors  are  absolute  but  not  relative).  Nevertheless, 
we  will  give  a  hybrid  interpretation  involving  both  backward  and  forward  relative  errors. 
Our  error  analysis  is  on  the  lines  of  that  presented  in  [56]. 

The  best  way  to  understand  our  first  result  is  by  studying  Figure  4.1.  Following 
Rutishauser,  we  merge  elements  of  L  and  D  into  a  single  array, 

Z  .  —  i  l\  ,  C?2,  1 2 ,  •  •  •  •>  dn  —  l ,  In  —  1  ?  ^n\  • 

Likewise,  the  array  Z  is  made  up  of  elements  di  and  Z+  contains  elements  D+(i),  L+(i) 
and  so  on.  The  acronym  ulp  in  Figure  4.1  stands  for  units  in  the  /ast  place  held.  It  is  the 
natural  way  to  refer  to  relative  differences  between  numbers.  When  a  result  is  correctly 
rounded  the  error  is  not  more  than  half  an  ulp. 

In  all  our  results  of  this  section,  numbers  in  the  computer  are  represented  by 
letters  without  any  overbar,  such  as  Z,  or  by  “hatted”  symbols,  such  as  Z+.  For  example 
in  Figure  4.1,  Z  represents  the  input  data  while  Z+  represents  the  output  data  obtained  by 
executing  the  dstqds  algorithm  in  finite  precision.  Intermediate  arrays,  such  as  Z  and  Z+, 
are  introduced  for  our  analysis  but  are  typically  unrepresentable  in  a  computer’s  limited 
precision.  Note  that  we  have  chosen  the  symbols  — ►  and  ^  in  Figure  4.1  to  indicate  a 
process  that  takes  rows  and  columns  in  increasing  order,  i.e. ,  from  “left  to  right”  and  “top 
to  bottom”.  Later,  in  Figure  4.2  we  use  and  to  indicate  a  “right  to  left”  and  “bottom 
to  top”  process. 

Figure  4.1  states  that  the  computed  outputs  of  the  dstqds  transformation  (see 
Algorithm  4.4.3),  D+(i)  and  L+(i)  are  small  relative  perturbations  of  the  quantities  D+  (i) 
and  L+  (i)  which  in  turn  are  the  results  of  an  exact  dstqds  transformation  applied  to  the  per¬ 
turbed  matrix  represented  by  Z-  The  elements  of  Z  are  obtained  by  small  relative  changes 
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Z 


dstqds 

computed 


change  each 
di  by  1  ulp, 
li  by  3  ulps. 


change  each 
D+  (i)  by  2  ulps, 
L+  (i)  by  3  ulps. 


Z 


dstqds 

exact 


Figure  4.1:  Effects  of  roundoff —  dstqds  transformation 

in  the  inputs  L  and  D.  Analogous  results  hold  for  the  dqds  and  dtwqds  transformation 
(see  Algorithms  4.4.5  and  4.4.6).  As  we  mentioned  above,  this  is  not  a  pure  backward  error 
analysis.  We  have  put  small  perturbations  not  only  on  the  input  but  also  on  the  output  in 
order  to  obtain  an  exact  dstqds  transform.  This  property  is  called  mixed  stability  in  [30] 
but  note  that  our  perturbations  are  relative  ones. 

Theorem  4.4.2  Let  the  dstqds  transformation  be  computed  as  in  Algorithm  4-4-3.  In  the 
absence  of  overflow  and  underflow,  the  diagram  in  Figure  f.l  commutes  and  di  (if)  differs 
from  di  (li)  by  1  (3)  ulps,  while  D+(i)  (L+(i))  differs  from  F>+  (i)  (L+  (i))  by  2  (3)  ulps. 

Proof.  We  write  down  the  exact  equations  satisfied  by  the  computed  quantities. 

D+fl)  = 

L+(i)  = 
and  s8'_|_i  = 

In  the  above,  all  e’s  depend  on  i  but  we  have  chosen  to  single  out  the  one  that  accounts  for 
the  subtraction  as  it  is  the  only  one  where  the  dependence  on  i  must  be  made  explicit.  In 
more  detail  the  last  relation  is 

=  „  *  ,  (1  +  £*)(1  +  £/)(l  +  £+)(l  +  £o)(l  +  £**)  —  h- 

T  (Li 


(Si  +  df)l(  1  +  £+), 

di  U(1  +  £*)(1  +  £/)/D+(i)  = 

L+(i)  Usfl  1  +  £o)(l  +  £**)  —  h 

1  +  £;+i 


di  f(  1  +  £*)(1  +  £/)(l  +  £+) 

Si  +  di 


(1  +  £i+l)^*  +  l 
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The  trick  is  to  define  di  and  /  i  so  that  the  exact  dstqds  relation 


-*■  di  l  i  $  i 

S*'+l  —  - - “  —  P 


si  +  di 


is  satisfied.  This  may  be  achieved  by  setting 

di  =  di(  1  +  £;), 

=  Si(l +  £«•)? 

U  =  l 


(1  +  £*)(1  +  £/)(  1  +  £  +  )(1  +  £0)(1  T  £**) 


1  +  £« 


In  order  to  satisfy  the  exact  mathematical  relations  of  dstqds, 

D+  (i)  =  si  +  di, 

~  di  l  i 


L+  0)  = 


di 


we  set 


D+  (i)  —  D+(i)(l  +  £+)(l  +  £;), 
L+  (i)  =  L+(i) 


(1  +  £o)(l  +  £■  >|c  >|c 


(1  +  £*)(1  +  £/)(l  +  e+)(l  T  £i) 

and  the  result  holds. 

A  similar  result  holds  for  the  dqds  transformation. 


(4.4.27) 


(4.4.28) 


(4.4.29) 

(4.4.30) 


(4.4.31) 

□ 


Theorem  4.4.3  Let  the  dqds  transformation  be  computed  as  in  Algorithm  4-4-5.  In  the 
absence  of  overflow  and  underflow,  the  diagram  in  figure  f.2  commutes  and  di  (li)  differs 
from  di  (li)  by  3  (3)  ulps,  while  D_(i )  ( U-(i ))  differs  from  L>~  ( i )  (U  -  (i))  by  2  (4)  ulps. 

Proof.  The  proof  is  similar  to  that  of  Theorem  4.4.2.  The  computed  quantities  satisfy 

(4.4.32) 


D-(i-\-  1)  —  (di  if  (l  +  £*)(1  +  £**)+  Pi+i)/(l  +  £+), 
t  =  di(  1  +  £ /)  /  D- 
U—(i )  =  lfl(  1  +  £0)  = 


t  —  di(l  +  £/)/D-(i  +  1), 

di  U(  1  +  £/)(l  +  £0)(1  +  £+) 


Pt  = 

(1  +  £«'  )Pi  = 


dt  lf(  1  +  £*)(1  +  £  >|c  >|c  )  +  Pi+1  ’ 
(di/D-(i  +  l))p;+i(l  +  £/)(l  +  £0o)  —  P 


1  +  £; 


dip 


'i+l 


dt  If  (l  +  £*)(1  +  £  >|c  >|c  )  +  Pi+1 


(1  +  £/)(l  +  £oo)(l  +  £+)  —  P- 
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Z 


dqds 

- - - -  Z_ 

computed 


change  each 
di  by  3  ulps, 
li  by  3  ulps. 


change  each 
D~  (1)  by  2  ulps , 
U-  (1)  by  4  ulps. 

dqds  — 

- -  Z- 

exact 


Figure  4.2:  Effects  of  roundoff —  dqds  transformation 


Note  that  the  above  e’s  are  different  from  the  ones  in  the  proof  of  the  earlier  Theorem  4.4.2. 
As  in  Theorem  4.4.2,  the  trick  is  to  satisfy  the  exact  relation, 


Pi 


which  is  achieved  by  setting 


di  Pi+ 1 

^  ^2  —  Pi 

di  li  +  Pi+i 


di  —  di(l  +  £/)(l  +  £00)(1  +  £+)? 


Pt 

and  /  i 
so  that  di  li 


Pi{  1  +  A')? 

.  1(1  +  e*)(l  +  £**)(!  +  £«+i) 

V  (1  +  £/)(l  +  £oo)(l  +  e+) 

dilf(  1  +  e*)(l  +  e**)(l  +  £«'+i  )• 


The  other  dqds  relations, 


D-  (1+1) 
U-  (i) 


may  be  satisfied  by  setting 


di  li  +  Pi+ 1, 


di  l  i 

di  li  +  Pi+ i 


D-  (1  +  1)  —  D-(i  +  1)(1  +  £+)(l  +  £i+i), 

“  /  .s  _  U~(i)  I  (1  +  £*)(1  +  £**)(1  +  £0o) 

1  +  £o  y  (1  +  £/)(l  +  £+)(l  +  £«+i) 


(4.4.33) 


(4.4.34) 

(4.4.35) 


(4.4.36) 

(4.4.37) 


(4.4.38) 
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Z 


dtwqds 

computed 


Zk 


change  each 
di  by  1  ulp,  1  <  i  <  k, 
li  by  3  ulps,  1  <  i  <  h, 
dj.  by  4  ulps,  4  by  3-|  ulps, 
di  by  3  ulps,  k  <  i  <  n, 
li  by  3  ulps,  k  <  i  <  n. 


change  each 

D+  (i)  by  2  ulps,  1  <  i  <  k, 

L+  (i)  by  3  ulps,  1  <  i  <  k. 

jk  by  2  ulps,  U-(k)  by  4^  ulps, 
D-  (i)  by  2  ulps,  k  <  i  <  n, 

U -  (i)  by  4  ulps,  k  <  i  <  n. 


dtwqds 

exact 


Zk 


Figure  4.3:  Effects  of  roundoff —  dtwqds  transformation 

By  combining  parts  of  the  analyses  for  the  dstqds  and  dqds  transformations,  we 
can  also  exhibit  a  similar  result  for  the  twisted  factorization  computed  by  Algorithm  4.4.6. 
In  Figure  4.3,  the  various  Z  arrays  represent  corresponding  twisted  factors  that  may  be 
obtained  by  “concatenating”  the  stationary  and  progressive  factors.  In  particular,  for  any 
twist  position  k, 

Zk  :=  {D+(l),L+(l),...,L+(k-  1),  %,  F_(fc), . . . ,  F_(n  -  l),l)_(n)}, 

:=  {D+(l),L+(l),...,L+(k-l),%,U-(k),...,U-(n-l),b-(n)}, 

while 

Z  * —  {di,  /  i,...,  Ik — l,  I  n—\,  dn\  • 

Zk  and  Zk  represent  the  twisted  factorizations 

NkDkNj  and  NkDkNj 

respectively  (note  that  ~  is  a  concatenation  of  the  symbols  ^  and  while  —  may  also  be 
derived  by  concatenating  and  — ►). 

Theorem  4.4.4  Let  the  dtwqds  transformation  be  computed  as  in  Algorithm  f.f.6.  In  the 
absence  of  overflow  and  underflow,  the  diagram  in  Figure  f.3  commutes  and  di  (if)  differs 
from  d{  (li)  by  1  (3)  ulps  for  1  <  i  <  k,  dk  (lk)  differs  from  dk  (lk)  by  4  (3^)  ulps,  while 
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di  ( if)  differs  from  di  (  f  )  by  3  (3)  ulps  for  k  <  i  <  n.  On  the  output  side,  D+(i)  ( L+(i )) 
differs  from  D+  (i)  (L+  (i))  by  2  (3)  ulps  for  1  <  i  <  k,  ( U-(k ))  differs  from  (U-(k)) 
by  2  (4^)  ulps,  while  D_(i )  ( U_(i ))  differs  from  D-  (i)  (U-  (*))  by  2  (4)  ulps  for  k  <  i  <  n. 


Proof.  The  crucial  observation  is  that  for  the  exact  stationary  transformation  (  (4.4.27), 
(4.4.29)  and  (4.4.30) )  to  be  satisfied  for  1  <  i  <  k  —  1,  roundoff  errors  need  to  be  put  only  on 
d\,  d2,  ■  ■  . ,  dfc-i  and  4,  I2, . . . ,  h-i-  Similarly  for  the  progressive  transformation  ((4.4.33), 
(4.4.36)  and  (4.4.37))  to  hold  for  k  +  1  <  i  <  n,  roundoff  errors  need  to  be  put  only  on 
the  bottom  part  of  the  matrix,  i.e. ,  on  dk+i, . .  .,dn  and  h+i,  ■  ■  .,ln-i-  By  the  top  formula 
in  (4.4.26), 


%  -  ^4  +  ^  ^  Pk+ i(l  +  £/  )(1  +  e00)j  j ' (1  +  £fc). 

Note  that  in  the  above,  we  have  put  the  superscript  —  on  some  e’s  to  indicate  that  they  are 
identical  to  the  corresponding  e’s  in  the  proof  of  Theorem  4.4.3.  By  (4.4.28)  and  (4.4.32), 


(1  +  £k)lk 
=y  (1  +  £fc)(l  +  £\  )lk 


sk  Pk+1  ■  (4(1  +  £/  )(1  +  £00)(1  +  £+) 

1  +  dk  Z|(  1  +  e*  )(1  +  £**)  +  Pk+i 

^  I  pk+ l(l  +  £k+ 1)  •  ^fc(l  +  £f  )(1  +  £oo)(l  +  £+)(l  +  £k  ) 
dk  ll(  1  +  £*  )(1  +  £**)(!  +  £k+i)  +  Pk+ i(l  +  £fc+i) 


Note  that  we  are  free  to  attribute  roundoff  errors  to  dk  and  4  in  order  to  preserve  exact 
mathematical  relations  at  the  twist  position  k.  In  particular,  by  setting 


7  k 
dk 


4 


7fc(l  +  £k)(  1  +  £t  ), 

dk(  1  +  £/  )(1  +  £00)(1  +  e+)(l  +  £t )’ 


1 k  * 


(1  +  £*  )(1  +  £**)(!  +  gfe+i) 

\  (1  + £/  )(i  +  £°°)(i  +  £+)(i  +  £t ) 


and  recalling  that  Pk=  Pk(  1  +  ek  )  (see  (4.4.34)),  the  following  exact  relation  holds, 


Ik 


■sk  + 


dk  Pk+i 
dk  Z|+  Pk+l 


ii-(k) 


dk  4 

dk  Z|+  Pk+1 


In  addition,  the  exact  relation 
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T  —  {&k i  bk)  ' 


change  each 
bk  by  2^  ulps. 


LDLt  decomposition 
computed 


T  —  ((ik ?  bk') ' 


LDLt  decomposition 
exact 


change  each 
dk  by  2  ulps , 
Ik  by  2^  ulps. 


Figure  4.4:  Effects  of  roundoff  —  LDL T  decomposition 


holds  if  we  set 


U-(k) 


U-(k)  (1  +  e*  )(1  +  £**)(1  +  £0o)(l  +  ) 

1  +  £0  \  (1  +  £/  )(1  +  £jt+1)(l  +  £+) 


(4.4.39) 


where  e~  is  identical  to  the  £0  of  (4.4.38).  □ 

Note:  A  similar  result  may  be  obtained  if  jk  is  computed  by  the  last  formula  in  (f.f.26). 

Before  we  proceed  to  the  next  section,  we  give  an  algorithm  and  error  analysis  for 
the  initial  decomposition 

T  +  pI  =  LDLt. 


We  denote  the  diagonal  elements  of  T  by  a;  and  off-diagonal  elements  by  bi. 


Algorithm  4.4.7  [Computes  the  initial  LDL T  decomposition.] 


d\  : —  a\  p 
for  i  =  1 ,  n  —  1 
li  • —  bi/ di 

di+i  :=  («i+ i  +  p)  —  Ubi 


end  for 
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Theorem  4.4.5  Let  the  LDL1  decomposition  be  computed  as  in  Algorithm  4-4.7.  In  the 
absence  of  overflow  and  underflow,  the  diagram  in  Figure  4-4  commutes  and  bi  differs  from 
b{  by  2^  ulps,  while  cfl  (If)  differs  from  cfl  (If)  by  2  (2^)  ulps. 


Proof.  The  computed  quantities  satisfy 


By  setting 


y  '  (1  +  £/)i 

(If 

(ai+ 1  +  a0/(1  +  £t+i)  —  bfli(l  +  £*) 


(1  +  £i+i)(l  +  £8+iK'+i  -  (ai+i  +  h)  ~  y  (1  +  e/)(l  +  e*)(l  +  £t+i)- 


di+i  —  di+i(l  +  £^1)(1  +  £j+1),  d\  —  d\(l  +  sf ) , 

r  _  /.  (1  +  £*)(1  +  ef+ 1) 

*\  (!  + £/)(!  + +  £*“)’ 


bt  -  btJ(l  +  £/)(l  +  £*)(1  +  e/fl)(l  +  £i")(l  +  £*  ), 


(4.4.40) 


the  following  exact  relations  hold 


ck+ 1  —  tt;+i  +  p  — j-. 

cL 


We  can  obtain  a  purely  backward  error  analysis  in  the  above  case  by  showing 
that  the  LDL T  decomposition  computed  by  Algorithm  4.4.7  is  exact  for  T  +  ST,  where  ST 
represents  an  absolute  perturbation  in  the  nonzero  elements  of  T. 

Theorem  4.4.6  In  the  absence  of  overflow  and  underflow,  the  LDL T  decomposition  com¬ 
puted  by  Algorithm  4-4-7  is  exact  for  a  slightly  perturbed  matrix  T  +  ST,  i.e., 


where 


T  +  ST  +  pL  =  LDL 1  , 

\Sbf  =  \r]ndtlt  +  rji2bi\,  i]i1 ,  i]i2  <  2. be, 

|£a;+i|  =  W^ditf  +  ?/;4^+i|,  %  <  3£,%  <  2e. 


(4.4.41) 

(4.4.42) 
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Proof.  From  Theorem  4.4.5,  the  following  relation  holds 

f  +  pi  =  LDLt , 


(4.4.43) 


where  T,  L  and  D  are  as  in  Figure  4.4  and  Theorem  4.4.5.  Equating  the  diagonal  and 
off-diagonal  elements  of  (4.4.43),  we  get 

h  =  dJi,  (4.4.44) 

ai+i  +  P  =  dpf  +  d)_|_i.  (4.4.45) 


By  Theorem  4.4.5, 


di  li 
d-l 2 

UlZl% 

di+ i 


(1  +  £t )(1  +  £»  )(!  +  £*)(1  +  £t+ 1) 
1  +  £/ 


?  j2  (1  +  £*)(1  +  sf+ 1) 
11  1  +  £/ 
dt+i(l  +  £)ffl)(l  +  £8+1 


dp}(^  +  %), 

=  di+i(l  +  7?i4 ) , 


^'^'(1  T  ho)’ 


where  rj^  <  2.5e,  i]i3  <  3e  and  7/4-4  <  2e.  Substituting  the  above  in  (4.4.44)  and  (4.4.45)  we 
get 


bi  T  (pi  b  i  Iji  |  (l;l  —  (p  i ; . 

ai+ 1  —  (VisdJi  +  h*4^*+i)  +  M  =  diif  +  di+i. 


The  result  now  follows  by  recalling  the  relation  between  bi  and  bi  given  in  (4.4.40).  □ 

The  backward  error  given  above  is  small  when  there  is  no  “element  growth”  in  the 
LDLt  decomposition.  The  following  lemma  proves  the  well  known  fact  that  no  element 
growth  is  obtained  when  factoring  a  positive  definite  tridiagonal  matrix. 

Lemma  4.4.1  Suppose  T  is  a  positive  definite  tridiagonal  matrix.  Its  LDL1  factorization, 
i.e., 

f  =  LDLt 


0  <  d{  <  T(ifi)  <  ||T||,  for  i  =  l,2,...,n 
0  <  <  T(i,i )  <  ||T||,  for  i=l,...,n—l 

and  dpi  =  f(i-\-l,i)  =y  \dfii\  <  ||T||,  for  i=l,...,n—l. 


satisfies 
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Proof.  The  proof  is  easily  obtained  by  noting  that 

+  di  =  T(i,i ), 
dJi  =  T(i  +  1,  i) 

and  using  properties  of  a  positive  definite  matrix  by  which  the  diagonal  elements  of  T  and 
D  must  be  positive.  □ 

Note  that  in  the  above  lemma,  we  do  not  claim  that  the  elements  of  L  are  bounded 
by  ||T||.  Indeed,  the  elements  of  L  can  be  arbitrarily  large  as  seen  from  the  following 
example, 


£2p  £p 

1  0 

£2p 

0 

1  £~P 

£p  1  +  £ 

£-p  1 

0 

e 

0  1 

Corollary  4.4.1  Suppose  T  +  pi  is  a  positive  definite  tridiagonal  matrix.  In  the  absence 
of  overflow  and  underflow,  Algorithm  4-4-7  computes  L  and  D  such  that 

T  +  ST  +  pi  =  LDLt , 


where 


\6bt\  <  5s\bi\, 

l^fh+il  <  3e|a8'_|_i  +  p\ , 


and  since  \p\  <  ||T||, 


ll^Hi  <  ben  +  3e\fi\  <  SeUTHL 

Proof.  The  result  follows  by  recalling  that  rfl  and  are  obtained  by  small  relative  changes 
in  di  and  respectively,  and  then  substituting  the  inequalities  of  Lemma  4.4.1  in  (4.4.41) 
and  (4.4.42).  □ 

Note:  The  backward  error  on  T  is  relative  if  p  =  0. 

4.4.3  Algorithm  X  —  orthogonality  for  large  relative  gaps 

In  this  section,  we  complete  the  preliminary  version  of  the  method  outlined  in 
Algorithm  4.4.1.  It  is  based  on  the  differential  qd-like  transformations  of  Section  4.4.1.  The 
following  algorithm  computes  eigenvectors  that  are  numerically  orthogonal  whenever  the 
relative  gaps  between  the  eigenvalues  of  LDLT  are  large,  i.e. ,  0(1). 
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Algorithm  X  [Computes  eigenvectors  using  bidiagonals.] 

1.  Find  fj,  <  ||T||  such  that  T  -\-  fil  is  positive  (or  negative)  dehnite. 

2.  Compute  T  +  /zL  =  LDLT . 

3.  Compute  the  eigenvalues,  <r|,  of  LDL T  to  high  relative  accuracy  (by  bisection  or  the 
dqds  algorithm  [56]). 

4.  For  each  computed  eigenvalue,  A  =  <rj,  do  the  following 

(a)  Compute  LDL T  —  XL  =  L+D+L^_  by  the  dstqds  transform  (Algorithm  4.4.3). 

(b)  Compute  LDLT  —  XL  =  U-D_UZ  by  the  dqds  transform  (Algorithm  4.4.5). 

(c)  Compute  by  the  top  formula  of  (4.4.26).  Pick  r  such  that  \~fr\  =  min^  \~fk\- 

(d)  Form  the  approximate  eigenvector  zj  =  by  solving  NrDrNj Zj  =  A/rer  (see 
Theorem  3.2.2): 

zj(r)  = 

Zj(i)  =  -L+(i)  ■  Zj(i  +  1),  i  =  r-  1, . . . ,  1,  (4.4.46) 

Zj(l  +  1)  =  — U-(l)  ■  z3(l ),  l  =  r,...,n—  1. 

(e)  If  needed,  compute  znrm  =  \\zj\\  and  set  Vj  =  Zj/ znrm. 

□ 

We  will  refer  to  the  above  method  as  Algorithm  X  in  anticipation  of  Algorithm  Y 
which  will  handle  the  case  of  small  relative  gaps. 


(4.5.49) 


95 


The  O  in  the  above  bounds  will  be  replaced  by  the  appropriate  expressions  in  the  formal 
treatment  given  in  Section  4.5.3.  Here,  and  for  the  rest  of  this  chapter,  we  assume  that  the 
singular  values  are  arranged  in  decreasing  order,  i.e. ,  a\  >  <72  >  •  •  •  >  an. 

As  a  special  case,  we  provide  a  rigorous  proof  that  Algorithm  X  computes  numer¬ 
ically  orthogonal  eigenvectors  whenever  the  eigenvalues  of  LDL T  have  large  relative  gaps. 
The  key  to  our  success  is  that  we  exploit  the  relative  perturbation  properties  of  bidiagonal 
matrices  given  in  Section  4.3  by  using  carefully  chosen  inner  loops  in  all  our  computations. 

The  bounds  of  (4.5.48)  and  (4.5.49)  are  meaningful  only  when  the  relative  distances 
are  not  too  small.  In  this  section,  we  are  only  interested  in  the  order  of  magnitudes  of  these 
relative  distances  and  not  in  their  exact  values.  Thus  in  our  discussions,  quantities  such  as 

relgap  (cj,(Tj-)  and  reldistp(<Tj,  <7;), 

where  aj  and  a j  agree  in  almost  all  their  digits,  are  to  be  treated  equivalently.  We  will,  of 
course,  be  precise  in  the  formal  statement  of  our  theorems  and  their  proofs. 

4.5.1  A  Requirement  on  r 

As  explained  in  Section  3.2,  we  require  that  the  choice  of  the  index  r  in  Step  (4c) 
of  Algorithm  X  be  such  that  the  residual  norm  of  the  computed  eigenpair,  |  Tt  I  / 1 1  I U  is 
“small” .  For  the  rest  of  this  thesis  we  assume  that  our  new  algorithms  always  choose  such 
a  “goof”  r.  We  now  explain  why  we  can  ensure  such  a  choice  of  r. 

1.  Theorem  3.2.3  proves  that  there  exists  a  “good”  choice  of  r,  i.e.,  3r  such  that 

■pyjj-  <  \/n\(Tj  —  I,  (4.5.50) 

where  Zj  is  the  vector  computed  in  Step  (4d)  of  Algorithm  X.  Note  that  in  all  our 
methods  to  compute  eigenvectors,  tjj  will  be  a  very  good  approximation  to  <7j,  i.e., 

=  0(e),  (4.5.51) 

Ith 

and  so  the  residual  norm  given  by  (4.5.50)  will  be  small. 

2.  Theorem  3.2.1  showed  that  a  “good”  choice  of  r  is  often  revealed  by  a  small  \~fr\  (this 
fact  is  utilized  in  Step  (4c)  of  Algorithm  X).  The  following  mild  assumption  on  the 
approximations  dj  and  the  separation  of  the  eigenvalues  <rj, 

l°j  “  1 

gap(d|,  7^  j})  “  2(n  —  1)  ’ 


(4.5.52) 
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where  gap(<rj,  {crf\i  fi  j})  =  min^j  |  b)  —  af\,  ensures  that 

\lr\  <  2n-\a]-a]\  (4.5.53) 

(note  that  we  have  obtained  the  above  bound  by  choosing  M  in  Theorem  3.2.1  to 
equal  2).  Note  that  \\zj\\  >  1  and  so  a  small  value  of  \~fr\  implies  a  small  residual 
norm.  Recall  that  bj  will  satisfy  (4.5.51)  and  so  the  eigenvalues  have  to  be  very  close 
together  to  violate  (4.5.52).  When  the  latter  happens,  there  is  a  theoretical  danger 
that  no  7 k  will  be  small  and  we  shortly  see  how  to  handle  such  a  situation.  However, 
in  all  our  extensive  random  numerical  testing,  we  have  never  come  across  an  example 
where  small  gaps  cause  all  7’s  to  be  large. 

3.  We  can  make  a  “good”  choice  of  r  even  in  the  situation  described  above  by  using 
twisted  Q  factorizations  that  were  discussed  in  Section  3.5.  Theorems  3.5.3  and  3.5.4 
indicate  how  to  use  these  factorizations  to  choose  an  r  that  satisfies  (4.5.50).  Of 
course,  Step  (4c)  of  Algorithm  X  needs  to  be  modified  when  is  not  small  enough. 
For  an  alternate  approach  to  compute  a  good  r  that  does  not  involve  orthogonal 
factorizations  the  reader  is  referred  to  [42]. 


In  summary,  we  have  seen  how  to  ensure  that  r  is  “good”  and  either  (4-5.50) 
or  (4-5.53)  is  satisfied. 

Of  course,  we  look  for  a  small  residual  norm  since  it  implies  that  the  computed 
vector  is  close  to  the  exact  eigenvector.  The  following  sin  0  theorem,  see  [110,  Chapter  11], 
is  well  known  and  shows  that  a  small  residual  norm  implies  a  good  eigenvector  if  the 
corresponding  eigenvalue  is  isolated.  The  theorem,  which  we  will  often  refer  to  in  the  next 
few  sections,  is  valid  for  all  Hermitian  matrices. 


Theorem  4.5.1  Let  A  =  A*  have  an  isolated  eigenvalue  A  with  normalized  eigenvector  v. 
Consider  y,  y*y  =  1,  and  real  ji  closer  to  A  than  to  any  other  eigenvalue.  Then 


sin  L(v,  y)\  < 


\\Ay  -  yp ||2 
gap(jj,) 


where  gap(p)  =  min{|z/  —  ji\  :  v  fi  A,  v  £  spectrum  (A)}. 


The  extension  of  this  theorem  to  higher- dimensional  subspaces  is  due  to  Davis  and 
Kahan,  see  [26]  and  [27]  for  more  details. 
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dtwqds 

(uj)  Z  - - - 

computed 

change  each 
di  by  1  ulp,  1  <  i  <  k, 
li  by  3  ulps,  1  <  i  <  k, 
dk  by  4  ulps,  Ik  by  3-|  ulps, 
di  by  3  ulps,  k  <  i  <  n, 
li  by  3  ulps,  k  <  i  <  n. 

dtwqds 

(“?)  Z  exact 


Zk  (vj) 

change  each 

D+  (i)  by  2  ulps,  1  <  i  <  k, 

L+  (i)  by  3  ulps,  1  <  i  <  k. 

7 k  by  2  ulps,  U~(k)  by  4-f  ulps, 
D-  (i)  by  2  ulps,  k  <  i  <  n, 

U -  (i)  by  4  ulps,  k  <  i  <  n. 

Zk  (vj) 


Figure  4.5:  dtwqds  transformation  applied  to  compute  an  eigenvector 

4.5.2  Outline  of  Argument 

We  alert  the  reader  to  the  fact  that  the  analysis  to  follow  involves  close  but  different 
quantities  such  as  v,  v ,  L ,  L ,  L ,  etc.  So  watch  the  overbars  carefully.  Recall  that  quantities 
with  a  ~  or  —  on  top  are  ideal  whereas  others  like  di  and  D+(i)  are  stored  in  the  computer. 

We  repeat  the  commutative  diagram  for  the  dtwqds  transformation  in  Figure  4.5 
for  easy  reference.  We  will  relate  the  computed  vector  Vj  to  an  eigenvector  Uj  of  LDL T  by 
first  relating  it  to  the  intermediate  vectors  Vj  and  Uj .  We  have  associated  these  vectors  with 
the  various  Z  arrays  in  Figure  4.5,  and  the  reader  may  find  it  helpful  to  refer  to  this  figure 
during  our  upcoming  exposition.  Before  we  give  the  formal  proofs,  we  sketch  a  detailed 
outline. 

1.  The  vector  computed  in  Step  (4d)  of  Algorithm  X  is  formed  only  by  multiplications. 
As  a  result,  the  computed  vector  Vj  is  a  small  componentwise  perturbation  of  the 
vector  Vj  which  is  the  exact  solution  to 

(LDLt  -  ap)vj  =  ||| ^er, 

!”'•(;)  7m  =  0(ne )  for  i  =  l,2,...,n,  (4.5.54) 

and  Z,  D  are  the  matrices  represented  by  Z  in  Figure  4.5.  For  a  formal  proof,  see 
Theorem  4.5.2  and  Corollary  4.5.3  below. 
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2.  To  relate  Vj  to  an  eigenvector  uj  of  LDLT ,  we  invoke  Theorem  4.5.1  to  show  that 


\% 


1% II  '  gap(o-?  ^  j}) 7 


(4.5.55) 


where  gap(dj,  \of\i  fi  j})  =  min^j  | aj  —  af  |,  af  being  the  ith  eigenvalue  of  LDL1 . 
Section  4.5.1  explained  how  we  can  always  ensure  that  the  residual  norm  is  small,  i.e. , 
I  Tr  I  / 1 1 Z?  1 1  =  0{neaj).  By  substituting  this  value  in  (4.5.55),  the  absolute  gap  turns 
into  a  relative  gap  and  we  get 

0(ne) 


(4.5.56) 


relgap(dj,  {(Ji\i  fi  j}) 

3.  Next  we  relate  Uj  to  an  eigenvector  uj  of  LDL1 .  L  and  D  are  small  componentwise 
perturbations  of  L  and  D  as  shown  by  our  roundoff  error  analysis  of  the  dtwqds 
transformation  in  Theorem  4.4.4.  By  the  properties  of  a  perturbed  bidiagonal  matrix 
Uj  can  be  related  to  Uj  (see  Corollary  4.3.2),  i.e., 

,  /_  m  0(ne) 

sm  L(uj,uj)\  =  — : - - -  .  (4.5.57) 

relgap2((Tj,  ^  j}) 

The  reader  should  note  that  the  matrices  T,  D  and  Z,  D  depend  on  <r|  whereas  LDLT 
is  the  fixed  representation.  By  (4.5.54),  (4.5.56)  and  (4.5.57),  we  have  related  vectors 
computed  by  Algorithm  X  to  eigenvectors  of  LDLT , 

I  •  M  0(n£ ) 

sm  L(vj,Uj)\  =  — - - -  , .  ,  v 

relgap2((Tj,  {cti\i  fi  ]}) 

Theorem  4.5.3  below  contains  the  details. 

Similarly,  using  the  Davis-Kahan  Sin©  theorem  [26,  27]  and  the  subspace  theorems 
given  in  Section  4.3,  it  can  be  shown  that 

0(ne) 


|  sin  l(vj,  U1:3)\  = 

and  |  sin  A(vj,  Ur'n)\  = 


reldist2((Tj,  &j+i)  ’ 
0(ne) 


(4.5.58) 

(4.5.59) 


reldist2(cj,  cfj-i)  ’ 

where  U1''3  and  U3:n  denote  the  invariant  subspaces  spanned  by  ui,U2,  ■  ■  ■  ,Uj  and 
Uj, . . .  ,un  respectively.  See  Theorems  4.5.4  and  4.5.5  for  the  details. 


4.  The  dot  product  between  the  neighboring  vectors  Vj  and  Vj+i  can  now  be  bounded 


cos  Lfibj,  7j+i)|  < 


cos  L  {|  -  [L{v3,  U1:j)  +  l(vj+1,U3+1:nj)  | 
sin  Z(£j,  U1:J)\  +  |  sin  Z(nJ+i,  UJ+1:n)\. 


since 
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By  (4.5.58)  and  (4.5.59),  we  get 

i~t~  i  0(ne) 

\V.V1  +  i\  =  — — - - - r. 

J  reidist2((Tj,  (Tj+i) 

For  details,  see  Corollary  4.5.2  below.  The  result  (4.5.48)  can  be  shown  in  the  same 
way.  Hence  if  all  the  relative  gaps  between  eigenvalues  of  LDL T  are  large,  the  vectors 
computed  by  Algorithm  X  are  numerically  orthogonal. 

5.  The  final  step  in  the  proof  is  to  show  that  the  residual  norms  with  respect  to  the 
input  tridiagonal  matrix  T  are  small,  i.e. ,  (4.5.47)  is  satisfied.  Since  we  can  always 
ensure  a  small  value  of  | Tt  I  / 1 1  I U  we  can  show  we  always  compute  eigenpairs  with 
small  residual  norms,  irrespective  of  the  relative  gaps.  See  Theorem  4.5.6  for  details. 


4.5.3  Formal  Proof 

Now,  the  formal  analysis  begins.  We  start  by  showing  that  the  computed  vector 
is  very  close  to  an  ideal  vector. 


Theorem  4.5.2  Let  Nr  and  Dr,  Nr  and  Dr  be  the  twisted  factors  represented  by  Zr  and 
Zr  respectively  in  Figure  f.5  ( see  Theorem  4-4-4  also).  Let  Zj  be  the  value  of  Zj  computed 
in  Step  (4d)  of  Algorithm  X,  and  let  Zj  be  the  exact  solution  of 

NrDrNj  Zj  =  frer. 

Then  Zj  is  a  small  relative  perturbation  of  Zj.  More  specifically, 


%0)  =  %0)  = 

*j(*)  =  + 


where 


h\  < 


4 (r  —  i)e,  1  <  i  <  r, 


5 (i  —  r)e,  r  <  i  <  n. 

Proof.  Accounting  for  the  rounding  error  in  (4.4.46)  of  Algorithm  X,  we  get 

%(0  =  -L+(i)  ■  Z3(i  +  T)  ■  (1  +  ef). 

Now  replace  L+  by  L+  using  (4.4.31)  in  Theorem  4.4.2  and  set  i  =  r  —  1  to  get 


(4.5.60) 


(4.5.61) 


*j(r  -  !)  =  -  L+  (r  -  1)  •  (1  +  J7+_! )zj(r)  ■  (1  +  e)  )  =  z3{r  -!)•(!  +  Vr- rX1  +  <  ), 
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where  <  3e.  Thus  (4.5.60)  holds  for  i  =  r  —  1  (note  that  e*-1  =  0  since  Zj{r)  =  1), 
and  similarly  for  i  <  r  —  1.  Rounding  errors  can  analogously  be  attributed  to  the  lower  half 
of  z  by 

ZjO  +  1)  =  ■  Zj(l)  ■  (1  +  £o+1). 

Replacing  U-  by  U-  using  (4.4.39)  and  setting  /  =  r,  we  obtain 

* j(r  + !)  =  -U-(r)  ■  Zj(r)  ■  (1  +  ri;+1)(l  +  £r0+1)  =  zfr  -  1)  •  (1  +  ??“+1)(l  +  £r0+1), 

where  i]~+1  <  4.5e.  Thus  (4.5.60)  holds  for  i  =  r  +  1  (note  that  £r0+1  =  0),  and  by 
using  (4.4.38)  we  can  similarly  show  that  it  holds  for  i  >  r  +  1.  □ 

The  following  theorem  is  at  the  heart  of  the  results  of  this  section. 


Theorem  4.5.3  Let  (o?,Uj)  denote  the  jth  eigenpair  of  LDLT ,  and  let  of  be  the  approx¬ 
imation  used  to  compute  zj  =  z^p  by  Step  (fd)  of  Algorithm  X,  where  r  is  an  index  such 
that 


I  Sr 


(4.5.62) 


Then 


smL(zj,Uj)\  <  5ne  + 


■sfn  \df  -  a 


a-21 


3  n£ 


gap(d?  {o-2| i  f  j})  relgap2((Tj,  {oi\i  f  j})’ 


where  of  is  the  ith  eigenvalue  of  LDL1  (see  Figure  f.5)  and  differs  from  of  by  a  few  ulps. 


Proof.  As  we  outlined  in  Section  4.5.2,  to  prove  this  result  we  will  relate  Zj  to  Uj  by  first  re¬ 
lating  zj  to  an  eigenvector  of  the  intermediate  matrix  LDL T  (see  Figure  4.5).  Theorem  4.5.2 
links  Zj  to  Zj  and  implies  that 

|  sin  Z(£j,  5j)|  <  \\zj  —  Zj\\  <  5 ne.  (4.5.63) 


Note  that  A  is  the  exact  solution  to 


( LDLt  —  df-L)zj  =  frer 


Let  ( of,Uj )  denote  the  jth  eigenpair  of  LDL 1  .  By  Theorem  4.5.1, 


s’m  L(zj,Uj)\  < 


I  Tr¬ 


im'll  ■gap^HMi})' 
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Since  r  is  such  that  (4.5.62)  is  satisfied  (recall  that  Section  4.5.1  explains  why  this  bound 
can  always  be  satisfied),  we  get 


s’m  < 


~  a] 


(4.5.64) 


g&p(a],{a?\i  ^  j}) 

Since  L  and  D  are  small  relative  perturbations  of  L  and  D,  is  a  small  relative  perturbation 

of  Gi  by  Corollary  4.3.1.  In  addition, 

i  •  3rae 

|  sin  L(Uj,  Uj)\  < 


i  (  r-i-ZTrv  (4-5.65) 

relgap2(<7j,  \Gi\i  ]}) 

where  we  obtained  the  above  bound  by  converting  all  the  ulp  changes  in  entries  of  L  and 
D  given  in  Theorem  4.4.4  to  ulp  changes  in  the  off-diagonal  and  diagonal  elements  of 
LD1/2,  and  then  applying  Corollary  4.3.2.  The  result  now  follows  from  (4.5.63),  (4.5.64) 
and  (4.5.65)  since 

|  sin  Z(£j,  Uj) |  <  |  sin  Z(£j,  zf) |  +  |  sin  L(zj,  uf)\  +  |  sin  Z(hj,  uf) \. 


We  can  generalize  the  above  result  to  bound  the  angle  between  the  computed  vector  and 
the  invariant  subspaces  of  LDL1 . 

Theorem  4.5.4  Let  a2  be  the  approximation  used  to  compute  Zj  by  Step  (4d)  of  Algo¬ 
rithm  X,  and  r  be  such  that  (4-5.62)  is  satisfied.  Let  (a2,Uj)  be  the  jth  eigenpair  of  LDLT 
and  let  U1:k  denote  the  subspace  spanned  by  u\, . . . ,  Uk-  Then  for  j  <  k  <  n, 


sin  L(zj,  lJ1:k)\  <  5ns  + 


3ns 


gap(dj ,  {d2k+1})  relgap 2(aJ,  {dfc+i})  ’ 


where  is  the  ith  eigenvalue  of  LDL1  (see  Figure  4-5)  and  differs  from  a2  by  a  few  ulps. 

Proof.  The  proof  is  almost  identical  to  that  of  the  above  Theorem  4.5.3  except  that 
instead  of  applying  Theorem  4.5.1  and  Corollary  4.3.2  to  get  bounds  on  the  angles  between 
individual  vectors,  we  apply  the  Davis-Kahan  Sin©  Theorem  [26,  27]  and  Corollary  4.3.3 
to  get  similar  bounds  on  the  angles  between  corresponding  subspaces.  □ 

Theorem  4.5.5  Let  a2  be  the  approximation  used  to  compute  Zj  by  Step  (fd)  of  Algo¬ 
rithm  X,  and  r  be  such  that  ( 4-5.62 )  is  satisfied.  Let  (a2,Uj)  be  the  jth  eigenpair  of  LDLT 
and  let  lJk:n  denote  the  subspace  spanned  by  uk, ...  ,un.  Then  for  1  <  k  <  j , 


sin  l(zj,  Uk:n)\  <  5ns  S 


-fn  \a2  -  a 


3  ns 


gap(<r2,  {ol^})  relgap 2 (dj,  {crfc_i})  ’ 


where  a2  is  the  ith  eigenvalue  of  LDLT  (see  Figure  4-5)  and  differs  from  a2  by  a  few  ulps. 
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Finally  we  can  bound  the  dot  products  between  the  computed  eigenvectors  using  the  above 
results. 


Corollary  4.5.1  Let  a2  and  cr2m  be  the  approximations  used  to  compute  Zj  and  zm  respec¬ 
tively  by  Step  (4d)  of  Algorithm  X.  Let  the  twist  indices  r  for  both  these  computations 
satisfy  (4-5.62).  Then 


l  ~X  ~ 

\ziz" 


<  10 ns  + 


[  Vn\  a]-a. 


mm  , _ J  21  +  yVI4,--4l 

k=j,m-i\gap(a],{al+1})  gap(<7^,  {a2}) 


3  ne 


3  ne 


for  j  <  m. 


relgap2((Tj,  {crfc+i})  relgap2(cjm,  {ak}) 
where  af2  is  the  ith  eigenvalue  of  LDLT  (see  Figure  4-5)  and  differs  from  a2  by  a  few  ulps. 


Proof.  The  cosine  of  the  angle  between  the  computed  vectors  can  be  bounded  by 


cosl(zj,zm)\  < 


cos  z  ||  -  (/(ij,  U1:k)  +  l(zm ,  Uk+v-n ))  } 


=  |  sin  Z(i2,  ?71:fc)|  +  |  sin  Z(im,  ?7 


where  U1:k  and  JJk+1-n  are  as  in  Theorems  4.5.4  and  4.5.5.  The  result  now  follows  by 
applying  the  results  of  these  theorems,  and  then  choosing  k  to  be  the  index  where  the 
bound  is  minimum.  □ 


Corollary  4.5.2  Let  a2  and  d2+1  be  the  approximations  used  to  compute  Zj  and  Zj+ 1  re¬ 
spectively  by  Step  (4d)  of  Algorithm  X.  Let  the  indices  r  for  both  these  computations  sat¬ 
isfy  (4-5.62).  Then 

\zjk+i)\  inr  V^\5)+1-<J2+1\ 

Pj||-pj+i||  “  gap(<7j,{CTj+1})  gap(CT?+1,{CT?}) 

3 ne  3 ne 

relgap2(<7j,  {b,+i})  relgap2((jJ+i,  {d2}) ' 

where  af2  is  the  ith  eigenvalue  of  LDLT  (see  Figure  4-5)  and  differs  from  a2  by  a  few  ulps. 

Proof.  This  is  a  special  case  of  Corollary  4.5.1.  □ 

Instead  of  assuming  (4.5.62),  if  we  assume  that 

\fr\  <  2n  ■  \a2  —  bj|,  (4.5.66) 

we  get  a  bound  that  is  weaker  by  a  factor  of  only  2 y/n.  If  such  an  index  r  is  chosen 
in  Algorithm  X,  we  can  modify  the  middle  terms  in  Corollaries  4.5.1  and  4.5.2  by  using 
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the  above  bound.  See  Section  4.5.1  to  see  how  we  can  ensure  that  either  one  of  (4.5.62) 
or  (4.5.66)  is  satisfied. 

Since  we  can  compute  the  singular  values  of  a  bidiagonal  matrix  to  high  relative 
accuracy,  we  can  find  &j  such  that 

|crj  -  (fj|  <  h(n)  ■  e  ■  <rj  (4.5.67) 

where  h  is  a  slowly  growing  function  of  n. 

For  the  sake  of  completeness,  we  prove  the  well  known  fact  that  normalizing  Zj,  as 
in  Step  (4e)  of  Algorithm  X,  does  not  change  the  accuracy  of  the  computed  eigenvectors. 

Corollary  4.5.3  Let  Zj  and  Zj  be  as  in  Theorem  4-5.2.  Let  Vj  =  Zj/\\zj\\,  and  Vj  be  the 
computed  value  of  Zj/\\zj\\  (see  Step  ( fe )  of  Algorithm  X).  Then 

Vj(i)  =  vj(i)  ■  (1  +  £i),  (4.5.68) 

where  \s)  <  (n  +  2)e  +  1 77^ |  +  max;  1 77^ | ,  and  rg  is  as  in  (4-5.61). 


Proof.  The  computed  value  of  ||ij||  equals  ||ij||  •  (1  +  £||)  where  |£|||  <  (n  +  l)e.  Thus 

-  Zj(i)  1  +  £/ 


1 2j  1 1  1  +  £1 


(4.5.69) 


By  (4.5.60)  of  Theorem  4.5.2, 

PjII  =  (x^.p)2-(l  +  ?h)2 


1/2 


1/2 


=  Zi 


Ki= 1 


It  follows  that 

Ill'll  =  Ill'll  •(!  +  hmax),  where  |?/max|  =  max  1 77*- 1 . 

% 

Substituting  (4.5.60)  and  (4.5.70)  in  (4.5.69),  we  get 

~  F4  =  (1  +  £/)(l  +  r/j) 

3  Ill'll  '  (l  +  £||)(l  +  hma x)’ 

and  the  result  follows. 

Thus  for  the  eigenpairs  computed  by  Algorithm  X,  the  following  result  holds. 


(4.5.70) 


Corollary  4.5.4  Let  (a2,Vj)  be  the  approximate  eigenpairs  computed  by  Algorithm  X.  As¬ 
suming  that  (4-5.66)  holds, 


<  22ne + 


gap(d|,{(r|+1}) 

3ns 


2n\a]+1  -  d]+1\ 
gap(d|+1,{d|}) 
3ns 


relgap2(<7j,  {ffj+i})  relgap^+i,  {tr,-})’ 


(4.5.71) 
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and  by  the  relative  accuracy  of  a?  (see  (4-5.6 7)), 


\vJvj+\)\  <  22  rae  + 


2 nh(n)e 


2 nh(n)e 


relgap (a2,  {a]+1})  relgap(<72+1,  {a]}) 

3 ne  3 ne 


relgap2((jJ,{(7J+i})  '  relgap2((jJ+i,  {cr,}) '  (4.5.72) 

where  (jf  is  the  ith  eigenvalue  of  LDL1  (see  Figure  f.5)  and  differs  from  a2  by  a  few  ulps. 


Note  that  we  have  chosen  to  exhibit  the  result  in  two  forms,  namely,  (4.5.71)  and 
(4.5.72).  The  difference  is  in  the  second  and  third  terms  of  the  two  bounds.  Since  we  only 
care  about  each  term  in  the  bound  being  0(e),  often  it  may  not  be  necessary  to  find  an 
eigenvalue  to  full  relative  accuracy.  For  example,  suppose  a2  =  e  and  o2n_x  =  1,  then  there 
is  no  need  to  find  all  the  correct  digits  of  a2.  Instead,  absolute  accuracy  will  suffice  and  be 
more  efficient  in  such  a  case.  The  first  result  (4.5.71)  makes  this  clear. 

We  now  show  that  irrespective  of  the  relative  gaps,  Algorithm  X  always  computes 
eigenpairs  with  small  residual  norms. 

Theorem  4.5.6  Let  T  +  pi  be  a  positive  definite  tridiagonal  matrix  and  let  (cr2,Vj)  be  the 
approximate  eigenpairs  computed  by  Algorithm  X.  Then  their  residual  norms  are  small,  i.e., 
for  j  =  1,2 

\\(T  +  fil  —  a?I)vj\\  <  ^2||T||i(2rah(ra)  +  lira3/2  +  4)  +  48^)  •  e,  (4.5.73) 

assuming  that  (4-5.66)  holds. 

Proof.  We  refer  the  reader  to  Figure  4.5  for  notation  used  in  this  proof.  Recall  that  by 
definition  the  vector  Vj  is  the  exact  solution  to 

( LDLT-a]l)vj  =  p|er.  (4.5.74) 

The  elements  of  L  and  D  are  small  relative  perturbations  of  the  elements  of  L  and  D.  In 
particular, 


df 

—  d{(  1  +  7]{), 

(4.5.75) 

dflf 

=  dili(l  F  1] f) 

(4.5.76) 

djf 

=  dtl2(l  +  gf) 

(4.5.77) 

\v{\  <  4e,  1 V2 1 

<  3.5e,  and  773  <  5e 

(4.5.78) 

where 
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(these  bounds  on  77* ’s  can  be  deduced  from  Theorems  4.4.2,  4.4.3  and  4.4.4).  Consider  the 
ith  equation  of  (4.5.74), 

1)  T  T  d3  &j  )uj(?)  T  d3l3Vj(z  T  1)  —  7C, 

where  K3  =  0  if  i  ^  r  and  Kr  =  7r/P||-  Substituting  (4.5.68),  (4.5.75),  (4.5.76)  and  (4.5.77) 
into  the  above  equation,  we  get 

di—\li—\Vj(z  1)  T  T  d3  &j  )hj(?)  T  d3l3Vj(^z  T  1)  —  7C  T 


where 


PI  <  p-l^-P-ll  +  p'|  •  (\di-\lf_i  \  +  P'|)  +  p+ifCCI)  + 

-  1)|  +  (IP-1 1  +  \v\\)  ■  \v3(i)\  +  | rf2Vj(i  +  1)|. 


In  the  above,  the  e8’s  are  as  in  (4.5.68).  Since  p|  <  line  and  by  (4.5.78),  we  can  bound 
by 

P|  <  line  ■  \\LDLt\\i  +  16s  ■  max{vj(i  —  +  1)}. 


Thus 


(LDLt  -  a]l)v3  =  r3 


Substituting  the  bound  for  p |  from  (4.5.66),  we  get 


(4.5.79) 


1 1 r j- 1 1  <  (2n/i(n)<72  +  lln3/2||T_DTT||i  +  48)  •  e  <  ((2nh(n)  +  lln3/2)\\LDLT\\i  +  48^  ■  e. 
By  the  backward  stability  of  the  Cholesky  factorization  (see  Corollary  4.4.1), 

T  +  ST  +  fil  =  LDLt ,  (4.5.80) 


where  ||£T||i  <  8e||T||i.  From  (4.5.79)  and  (4.5.80), 

(T  +  fil  —  (?jl)vj  =  (LDLt  -  a] I  -  6T)vj  =  rj-ST-Vj, 
and  the  result  follows  since  ||T_DTt||i  <  2||T||i  (recall  that  p|  <  ||T||).  □ 


4.5.4  Discussion  of  Error  Bounds 

The  careful  reader  may  worry  about  the  nh(n )  and  n3/2  terms  in  the  bounds  on  the 
residual  norms  of  (4.5.73),  and  the  nh(n )  term  in  the  dot  product  bound  given  by  (4.5.72). 
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Figure  4.6:  An  eigenvector  of  a  tridiagonal  :  most  of  its  entries  are  negligible 

The  fear  is  that  when  n  is  large  enough,  these  terms  are  no  longer  negligible  and  may  lead  to 
a  loss  in  accuracy.  We  reassure  the  reader  that  our  error  bounds  can  be  overtly  pessimistic 
and  this  is  borne  out  by  numerical  experience. 

Often,  an  eigenvector  of  a  tridiagonal  is  lion-negligible  only  in  a  tiny  fraction 
of  its  entries,  see  Figure  4.6.  When  this  happens  we  say  that  the  eigenvector  has  small 
“support”.  In  such  a  case,  the  error  bounds  of  (4.5.72)  and  (4.5.73)  can  effectively  be 
reduced  by  replacing  n  with  |supp|  which  denotes  the  support  of  the  eigenvector  under 
consideration.  For  example,  as  in  Figure  4.6,  we  may  have  n  =  1000  but  |supp|  may  be  just 
over  50  for  some  eigenvectors. 

We  give  some  pointers  to  the  various  places  where  our  bounds  may  be  pessimistic. 

1.  The  small  |supp|  of  an  eigenvector  can  lead  to  smaller  error  bounds  in  Theorem  4.5.2 
and  Corollary  4.5.3.  This  would  mean  that  n  can  be  replaced  by  |supp|  in  the  terms 
22ns  and  11  ??3/2  that  occur  in  (4.5.72)  and  (4.5.73)  respectively. 

2.  The  bounds  of  Corollaries  4.3.2  and  4.3.3  that  are  used  frequently  may  also  permit  n 
to  be  replaced  by  |supp|. 

3.  The  term  h(n)  in  (4.5.67)  may  be  quite  small.  Indeed,  if  bisection  is  used  for  com¬ 
puting  the  eigenvalues,  then  h(n)  =  0(|supp|). 


107 


4.  The  y/n  term  in  (4.5.62)  is  really  HhjH^1  (see  Theorem  3.2.3)  and  can  be  0(1)  if  the 
largest  entry  of  the  normalized  eigenvector  is  0(1).  Consequently,  our  results  will  be 
more  accurate  than  as  suggested  by  our  error  bounds. 

4.5.5  Orthogonality  in  Extended  Precision  Arithmetic 

Suppose  the  user  wants  d  digits  of  accuracy,  i.e. ,  residual  norms  and  dot  products 
of  about  10_<i  are  acceptable.  Until  now,  we  have  implicitly  assumed  that  the  arithmetic 
precision  in  which  we  compute  is  identical  to  the  acceptable  level  of  accuracy,  e.g.,  we  desire 
0(e)  accuracy  in  our  goals  of  (1.1.1)  and  (1.1.2)  where  e  is  the  precision  of  the  arithmetic. 
Can  a  desired  level  of  accuracy  by  guaranteed  by  arithmetic  of  a  higher  precision?  For 
example,  the  user  may  want  single  precision  accuracy  when  computing  in  double  precision, 
or  we  may  aim  for  double  precision  accuracy  by  computing  in  quadruple  precision  arithmetic. 
The  IEEE  standard  also  specifies  a  Double- Extended  precision  format  (sometimes  referred 
to  as  “80-bit”  arithmetic)  and  on  some  machines,  these  extra  precise  computations  may 
be  performed  in  hardware  [2,  65].  Quadruple  precision  arithmetic  is  generally  simulated  in 
software  [91,  120]. 

In  order  to  get  single  precision  accuracy,  we  may  try  and  execute  Algorithm  X  in 
double  precision  arithmetic.  However,  there  are  cases  when  this  simple  strategy  will  not 
work.  Consider  Wilkinson’s  matrix  ITj1)  where  the  largest  pair  of  eigenvalues  agree  to  more 
than  16  digits  (see  [136,  p.309]  for  more  details).  By  the  theory  developed  in  Section  4.5, 
the  corresponding  eigenvectors  computed  by  Algorithm  X  can  be  nearly  parallel  even  if  we 
compute  in  double  precision!  And  indeed  in  a  numerical  run,  we  observe  large  dot  products. 
Thus  we  cannot  use  the  doubled  precision  accuracy  in  a  naive  manner. 

We  now  indicate  without  proof  that  Algorithm  X  can  easily  be  modified  to  deliver 
results  that  are  accurate  to  a  desired  accuracy  when  operating  in  arithmetic  of  doubled 
precision,  i.e.,  by  slightly  modifying  Algorithm  X,  it  almost  always  delivers  eigenpairs 
with  0(y/s)  residual  norms  and  dot  products  when  using  arithmetic  of  precision 
e.  The  modification  is  that  after  computing  the  LDLT  decomposition  in  Step  2  of  Algo¬ 
rithm  X,  random  componentwise  perturbations  of  0(y/s)  should  be  made  in  the  elements 
of  L  and  D.  This  perturbation  makes  it  unlikely  that  relative  gaps  between  the  eigenvalues 
of  the  perturbed  LDLT  will  be  smaller  than  0(y/e).  In  many  cases,  the  above  effect  can 
be  achieved  just  by  performing  the  computation  to  get  the  LDL T  decomposition  in  yfe 
precision  arithmetic.  The  computation  of  the  eigenvalues  and  eigenvectors  by  Steps  3  and  4 
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of  Algorithm  X  must,  of  course,  be  performed  in  the  doubled  precision.  The  results  of 
Theorem  4.5.6  and  Corollary  4.5.4  now  imply  that  both  residual  norms  and  dot  products 
are  0(y/e). 


4.6  Numerical  Results 

We  now  provide  numerical  evidence  to  verify  our  claims  of  the  last  two  sections. 
We  consider  two  types  of  matrices  with  the  following  distribution  of  eigenvalues. 

Type  1.  n—  1  eigenvalues  uniformly  distributed  from  e  to  (n—  l)e,  and  the  nth  eigenvalue 
at  1,  i.e. , 

Aj  =  i  •  e,  i  =  1,  2, . . . ,  n  —  1,  and  Xn  =  1. 

Type  2.  One  eigenvalue  at  e,  n  —  2  eigenvalues  uniformly  distributed  from  1  +  y/e  to 
1  +  (n  —  2)y/e,  and  the  last  eigenvalue  at  2,  i.e., 

Ai  =  e,  Aj-  =  1  +  (i  —  1)  •  i  =  2, . . . ,  n  —  1,  and  Xn  =  2. 


We  generated  matrices  of  the  above  type  using  the  LAPACK  test  matrix  genera¬ 
tor  [36],  which  first  forms  a  random  dense  symmetric  matrix  with  the  given  spectrum  and 
Householder  reduction  of  this  dense  matrix  then  yields  a  tridiagonal  of  the  desired  type. 

In  Table  4.1,  we  compare  the  times  taken  by  our  new  algorithm  with  the  LAPACK 
and  EISPACK  implementations  of  inverse  iteration  on  matrices  of  type  1.  The  0(n3) 
behavior  of  the  LAPACK  and  EISPACK  codes  is  seen  in  this  table  while  Algorithm  X  takes 
0(n2)  time.  We  see  that  Table  4.1  shows  the  new  algorithm  to  be  consistently  faster  —  it 
is  about  3  times  faster  on  a  matrix  of  size  50  and  nearly  23  times  faster  on  a  1000  X  1000 
matrix.  As  we  proved  in  the  last  section,  Algorithm  X  delivers  vectors  that  are  numerically 
orthogonal,  and  this  is  seen  in  Table  4.2. 

On  matrices  of  type  2,  our  theory  predicts  that  the  vectors  we  compute  will  have 
dot  products  of  about  y/e  (see  Corollary  4.5.4).  Indeed,  as  Table  4.3  shows,  that  is  what 
we  observe.  The  times  taken  by  Algorithm  X  to  compute  the  vectors  in  this  case  are 
approximately  the  same  as  the  times  listed  in  Table  4.1.  Vectors  that  have  dot  products 
of  O(y^)  are  sometimes  referred  to  as  a  semi-orthogonal  basis.  In  some  cases,  such  a  basis 
may  be  as  good  as  an  orthogonal  basis,  see  [114]. 
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Matrix 

Size 

Time(LAPACK) 
(in  s.) 

Time(EISPACK) 
(in  s.) 

Time(Alg.  X) 
(in  s.) 

Time(Alg.  X)  / 
Time(LAPACK) 

50 

0.10 

0.09 

0.03 

3.33 

100 

0.45 

0.34 

0.07 

6.43 

250 

3.60 

2.32 

0.37 

9.73 

500 

19.88 

11.21 

1.35 

14.73 

750 

57.53 

29.65 

2.98 

19.31 

1000 

124.98 

60.81 

5.51 

22.68 

Table  4.1:  Timing  results  on  matrices  of  type  1 


Matrix 

max,  \\Tvi-  of 

A  || 

max,yj  \v(v3 

Size 

LAPACK 

EISPACK 

Alg.  X 

LAPACK 

EISPACK 

Alg.  X 

50 

1.6  •  10“16 

5.9  •  10“15 

1.2  •  10“16 

1.1  •  10“15 

2.8  •  10“15 

2.5  •  10“15 

100 

3.1  •  10-17 

1.5  •  10-15 

1.3  •  10-17 

1.1  •  10-15 

5.7-  10-15 

2.0  •  10-15 

250 

1.1  •  10-16 

2.1  •  10“14 

1.1  •  10-16 

1.7  •  10-15 

1.4  •  10-14 

1.5  •  10-14 

500 

1.1  •  10-16 

4.9  •  10-14 

5.5  •  10-18 

3.5  •  10-15 

2.0  •  10-14 

4.2  •  10-14 

750 

1.1  •  10-16 

3.6  •  10“14 

3.2  •  10“18 

4.6  •  10“15 

3.9  •  10“14 

4.1  •  10-14 

1000 

1.2  •  10-17 

5.1  •  10“14 

2.2  •  10-16 

4.4  •  10-15 

6.0  •  10-14 

8.3  •  10-14 

Table  4.2:  Accuracy  results  on  matrices  of  type  1 


Matrix  Size 

max,  \\Tvi-  afvt\\ 

max.gj  \vfvj\ 

50 

7.2  •  10-16 

5.2  •  10-9 

100 

1.0  •  10“15 

3.9  •  10-9 

250 

1.5  •  10-16 

2.5  •  10-9 

500 

2.1  •  10“15 

2.1  •  10“9 

750 

2.6  •  10-15 

2.4-  10“9 

1000 

2.9  •  10-15 

1.7-  10“9 

Table  4.3:  Accuracy  results  on  matrices  of  type  2 
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A  nice  property  of  Algorithm  X  is  that  the  dot  products  of  the  vectors  computed 
can  be  predicted  quite  accurately,  based  solely  on  the  relative  separation  of  the  eigenvalues. 
As  exhibited  in  Sections  2.8  and  4.2,  existing  implementations  do  not  have  such  a  property. 
This  feature  is  useful  when  larger  dot  products  are  acceptable,  such  as  in  the  case  of  semi- 
orthogonal  vectors. 

Algorithm  X  is  a  major  step  in  obtaining  an  0(n2)  algorithm  for  the  symmetric 
tridiagonal  problem.  However,  it  does  not  always  deliver  vectors  that  are  numerically 
orthogonal.  In  the  next  chapter,  we  investigate  how  to  extend  Algorithm  X  in  order  to 
always  achieve  the  desired  accuracy  while  doing  only  0{n2)  work. 


Ill 


Chapter  5 

Multiple  Representations 


In  the  previous  chapter,  we  saw  how  to  obtain  vectors  that  are  guaranteed  to 
be  numerically  orthogonal  when  eigenvalues  have  large  relative  gaps.  In  this  chapter  and 
the  next,  we  concentrate  our  energies  on  the  case  when  relative  gaps  are  smaller  than  a 
threshold,  say  1/n.  Such  eigenvalues  will  be  called  a  cluster  throughout  this  chapter. 

The  following  is  our  plan  of  attack: 

1.  In  Section  5.1,  we  present  two  examples  which  suggest  that  orthogonality  may  be 
achieved  by  shifting  the  matrix  close  to  a  cluster  and  then  forming  a  bidiagonal  factor¬ 
ization  of  the  shifted  matrix.  The  aim  is  to  clearly  distinguish  between  the  individual 
eigenvalues  of  the  cluster  so  that  we  can  treat  each  eigenvalue  as  isolated  and  compute 
the  corresponding  eigenvector  as  before.  If  the  bidiagonal  factorization  is  “good”,  the 
computed  vectors  will  be  nearly  orthogonal. 

2.  In  Section  5.2,  we  list  the  properties  that  a  “good”  bidiagonal  factorization  must 
satisfy.  We  call  such  a  factorization  a  relatively  robust  representation.  Section  5.2.1 
introduces  relative  condition  numbers  that  indicate  when  a  representation  is  relatively 
robust.  In  Section  5.2.3,  we  investigate  factorizations  of  nearly  singular  tridiagonal 
matrices  and  attempt  to  explain  why  these  representations  are  almost  always  relatively 
robust. 


3.  In  Section  5.3,  we  explain  why  the  vectors  computed  using  different  relatively  robust 
representations  turn  out  to  be  numerically  orthogonal.  We  do  so  by  introducing  a 
representation  tree  which  we  use  as  a  visual  tool  for  our  exposition.  A  representation 
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tree  summarizes  the  computation  and  helps  in  relating  the  various  representations  to 
each  other. 

4.  In  Section  5.4,  we  present  Algorithm  Y  which  is  an  enhancement  to  Algorithm  X  that 
was  earlier  presented  in  Chapter  4.  Algorithm  Y  takes  0(n2)  time  and  handles  the 
remaining  case  of  small  relative  gaps.  Unlike  Algorithm  X,  we  do  not  have  a  proof  of 
correctness  of  Algorithm  Y  as  yet.  In  ah  our  numerical  testing,  which  we  present  in 
the  next  chapter,  we  have  always  found  it  to  deliver  accurate  answers. 


5.1  Multiple  Representations 

Example  5.1.1  [Small  Relative  Gaps.]  Consider  the  matrix 

.520000005885958  .519230209355285 
_  .519230209355285  .589792290767499  .36719192898916 

°  .36719192898916  1.89020772569828  2.7632618547882 • 10“8 

2.7632618547882 • 1(T8  1.00000002235174 

with  eigenvalues 

Ai  ~  e,  A2  ~  1  +  y/l,  A3  ~  1  +  2 y/s,  A4  ~  2.0, 
where  e  ~  2.2  X  10-16  (all  these  results  are  in  IEEE  double  precision  arithmetic).  Since 

relgap2(A2,  A3)  ss  y/s, 

Corollary  4.5.4  implies  that  the  vectors  computed  by  Algorithm  X  have  a  dot  product  of 

\v%v3\  =  0(ne/y/e)  =  0{ny/e). 


The  challenge  is  to  obtain  approximations  to  u2  and  u3  that  are  numerically  orthogonal 
without  resorting  to  Gram-Schmidt  or  a  similar  technique  that  explicitly  orthogonalizes 
vectors. 


Note  that  the  eigenvectors  of  Tq  are  identical  to  the  eigenvectors  of  Tq  —  I.  We 


can  form 


To -I  = 


LqDqL 


T 

0  5 


(5.1.1) 
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to  get 


diag(Do)  = 


-.4799999941140420 
.1514589857947483 
3.074504323352656- 10“ 
1.986821250068485- 10“ 


diag(L0,  — 1)  = 


-1.081729616088125 

2.424365428451800 

.08987666186707277 


where  we  have  used  the  MATLAB  notation  “diag(Ho5  —1)”  to  give  the  subdiagonal  entries 
of  Lq.  Note  that  the  interior  eigenvalues  of  this  shifted  matrix  are  y/e  and  2y/e.  The  relative 
gap  between  these  numbers  is  now  large!  Can  we  exploit  these  large  relative  gaps  as  we  did 
in  the  previous  chapter? 

Suppose  LDLt  is  a  factorization  where  small  relative  changes  in  L  and  D  result 
in  tiny  changes  in  all  its  eigenvalues  and  a  corresponding  small  change  in  the  eigenvectors. 
By  revisiting  the  proofs  that  lead  to  CoroUary  4.5.4,  we  discover  that  in  such  a  case  if 
Steps  3  and  4  of  Algorithm  X  are  applied  to  LDLT ,  then  the  computed  vectors  will  be 
nearly  orthogonal.  Indeed,  if  we  apply  these  steps  to  LqDqLq  ,  the  vectors  computed  are 


v2  = 


.4999999955491866 

.4622227251939223 

-.1906571596350473 

.7071067841882251 


.4999999997942006 

.4622227434674882 

-.1906571264658161 

-.7071067781848689 


and  vjvs  =  2e!  It  appears  to  be  a  miracle  that  by  considering  a  translate  of  the  original  T 
that  makes  the  relative  gap  small,  we  are  able  to  compute  eigenvectors  that  are  orthogonal 
to  working  accuracy.  □ 


Clearly,  success  in  the  above  example  is  due  to  the  property  of  the  decomposi¬ 
tion  LoDoLq  by  which  all  of  its  eigenvalues  change  by  small  relative  amounts  under  small 
componentwise  perturbations.  In  Section  4.3,  we  saw  that  every  positive  definite  tridiago¬ 
nal  LDLt  enjoys  this  benign  property.  However,  not  every  decomposition  of  an  indefinite 
tridiagonal  shares  this  property,  see  Example  5.2.3  for  one  such  decomposition.  But  the 
important  question  is:  can  we  always  find  such  “relatively  robust”  representations  “near” 
a  cluster? 

One  distinguishing  feature  of  the  LDLT  decomposition  of  a  positive  definite  matrix 
is  the  lack  of  any  element  growth.  For  the  decomposition  T  —  pi  =  LDLT ,  let  us  define1 

i  f  j)(i  i)  1/2 

Element  Growth  =  max  — - - 2 -  .  (5.1.2) 

T(i,i)  ~  h 

*we  do  not  have  a  strong  preference  for  this  particular  definition  of  element  growth;  indeed,  it  may  be 
possible  to  get  a  “better”  definition 
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When  T  -  nl  is  positive  definite,  the  element  growth  always  equals  1  (see  Lemma  4.4.1). 
For  the  matrix  To  of  Example  5.1.1,  we  can  verify  that  the  element  growth  at  fj,  =  1  also 
equals  1.  We  might  suspect  a  correlation  between  the  lack  of  element  growth  and  high 
relative  accuracy.  We  speculate  further  on  this  correlation  in  Section  5.2.3. 

Suppose  there  is  a  large  element  growth  when  forming  LDL1 .  Small  relative 
perturbations  in  the  large  elements  of  L  and  D  result  in  large  absolute  perturbations.  Thus 
it  appears  unlikely  that  the  eigenvalues  of  such  an  LDLT  can  be  computed  to  absolute 
accuracy.  Our  goal  of  computing  the  small  eigenvalues  of  LDL T  to  high  relative  accuracy 
seems  to  hold  out  even  less  hope.  However,  as  the  following  example  shows,  we  can  be  in  a 
peculiar  but  lucky  situation. 


Example  5.1.2  [Large  Element  Growth.]  Consider  the  matrix 


Ti 


1.00000001117587  .707106781186547 
.707106781186547  .999999977648258  .707106781186546 

.707106781186546  1.00000003352761 

1.05367121277235- 10“® 


1.05367121277235- 10“® 
1.00000002235174 


whose  eigenvalues  are  approximately  equal  to  those  of  To  (see  Example  5.1.1).  To  get 
orthogonal  approximations  to  the  interior  eigenvectors  we  can  try  the  same  trick  as  before. 
However  when  we  form  T\  —  I  =  L\D\L^ ,  we  get 


diag(Di)  = 


1.117587089538574- 10“8 

-4.473924266666669- 107 

,  diag(Li,  —  1)  = 

6.327084877781628- 107 

-1.580506693551128- 10“® 

4.470348380358755- 10“® 

.2357022791274500 

1.986821493746602- 10“® 

There  is  appreciable  element  growth  in  forming  the  (2,2)  element  of  D i,  and  we  may  be 
skeptical  of  using  this  factorization  to  get  the  desired  results.  But  there  is  no  harm  in  trying 
to  apply  Steps  3  and  4  of  Algorithm  X  to  LiD\L^  to  compute  the  interior  eigenvectors.  By 
doing  so,  we  get 


.4999999981373546 

-.5000000018626457 

2.634178061370114- 10“9 

-1.317089024797209- 10“® 

,  v3  = 

-.4999999981373549 

.5000000018626453 

.7071067838207259 

.7071067785523688 

and  miraculously  1 7)^" T3 1  <  3s!  The  corresponding  residual  norms  are  also  small! 


□ 
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The  above  example  appears  to  be  an  anomaly.  Due  to  the  large  entry  _Di(2,2), 
we  should  not  even  expect  absolute  accuracy  in  the  computed  eigenvalues  and  eigenvectors. 
Note  that  in  the  above  example  we  used  L\D\L^  to  compute  only  V2  and  £3.  It  turns 
out  that  small  componentwise  perturbations  in  L\  and  D\  result  in  small  relative  changes 
in  only  the  two  small  eigenvalues  of  ,  but  large  relative  (and  hence,  absolute) 

changes  in  the  extreme  eigenvalues.  We  get  inaccurate  answers  if  we  attempt  to  compute 
approximations  to  v\  and  V4  using  L\D\Lj .  These  extreme  eigenvectors  must  be  computed 
using  a  different  representation,  say  the  Cholesky  decomposition  of  T\.  We  investigate 
this  seemingly  surprising  phenomenon  in  Section  5.2.1.  More  details  on  the  two  examples 
discussed  above  may  be  found  in  Case  Study  C. 


5.2  Relatively  Robust  Representations  (RRRs) 


In  Algorithm  X  and  the  two  examples  of  the  previous  section,  triangular  fac¬ 
torizations  of  translates  of  T  allow  us  to  compute  orthogonal  eigenvectors  whenever  the 
eigenvalues  of  these  factorizations  have  large  relative  gaps.  We  now  identify  the  crucial 
property  of  these  decompositions  that  enables  such  computation. 

Informally,  a  relatively  robust  representation  is  a  set  of  numbers  that  define  a  matrix 
A  such  that  small  componentwise  perturbations  in  these  numbers  result  in  a  small  relative 
change  in  the  eigenvalues,  and  the  change  in  the  eigenvectors  is  inversely  proportional  to 
the  relative  gaps  in  the  eigenvalues.  For  example,  a  unit  lower  bidiagonal  L  and  a  positive 
diagonal  matrix  D  are  the  triangular  factors  of  the  tridiagonal  matrix  LDL1 ,  and  form 
a  relatively  robust  representation  as  shown  by  the  theory  outlined  in  Section  4.3.  In  the 
following,  we  denote  the  jth  eigenvalue  and  eigenvector  of  A  by  A j  and  Vj  respectively, 
while  the  corresponding  perturbed  eigenvalue  and  eigenvector  are  denoted  by  A  j  +  SXj  and 
Vj  +  Svj.  More  precisely, 


Definition  5.2.1  A  Relatively  Robust  Representation  or  RRR  is  a  set  of  numbers  {ay}  that 
define  a  matrix  A  such  that  if  ay  is  perturbed  to  ay(  1  +  e8),  then  for  j  =  1,  2, . . . ,  n, 


1% 

I'M 


sin  Z(fj,  Vj  +  8vj) \ 


where  relgap  is  the  relative  gap  as  defined  in  Section  4.3. 
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Typically,  the  RRRs  we  consider  will  be  triangular  factors  L  and  D  of  the  shifted 
matrix  T  —  nL.  We  will  frequently  refer  to  such  a  factorization  as  a  representation  of  T 
(based)  at  the  shift  fi.  Also,  for  the  sake  of  brevity,  we  will  often  refer  to  the  underlying 
matrix  LDL T  as  the  RRR  (instead  of  the  individual  factors  L  and  D).  Sometimes  a 
representation  may  determine  only  a  few  but  not  all  its  eigenvalues  to  high  relative  accuracy. 
For  example,  a  representation  may  only  determine  the  eigenvalues  Ay ,  Aj+i, . . . , A^  to  high 
relative  accuracy.  In  such  a  case,  we  say  that  we  have  a  partial  RRR  and  denote  it  by 
RRR(j,  j  +  1, . . . ,  k).  In  the  next  section,  we  will  show  that  the  representation  of  To  at  the 
shift  1  is  an  RRR  while  that  of  T\  at  1  is  a  partial  RRR(2,3),  where  To  and  T\  are  the 
matrices  of  Examples  5.1.1  and  5.1.2  respectively. 

5.2.1  Relative  Condition  Numbers 

We  now  find  a  criterion  to  judge  if  the  factors  L  and  D  form  an  RRR.  Instead 
of  dealing  with  LDL T  we  switch  to  a  Cholesky-like  decomposition  for  the  purpose  of  our 
analysis, 

ldlt  =  iniT, 

where  fl  =  sign(T)  is  a  matrix  with  ±1  entries  and  explicitly  captures  the  “indefiniteness” 
of  the  matrix,  and  L  is  lower  triangular.  Note  that  Ll  is  the  identity  matrix  when  LDLT  is 
positive  definite.  We  made  a  similar  switch  in  Chapter  5  where  we  analyzed  the  perturbation 
properties  of  the  Cholesky  factor  while  performing  our  computations  with  LDLT .  The 
relationship  between  these  alternate  representations  is  easy  to  see  and  is  given  by 

L  =  L\D\1/2. 

By  Theorem  4.4.1,  both  these  representations  are  similar  in  terms  of  their  behavior  under 
small  relative  perturbations. 

Thus  we  consider  the  factorization 

T  =  mLT.  (5.2.4) 

As  shown  in  Section  4.3,  if  we  make  small  componentwise  perturbations  in  T,  the  perturbed 
bidiagonal  may  be  represented  as  D1LD2  where  Di  and  D2  are  diagonal  matrices  close  to  the 
identity  matrix.  We  are  interested  in  answering  the  following  question.  Are  the  eigenvalues 
of  the  perturbed  matrix 


T  +  6T  =  D1LD2nDlLTD^ 
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relatively  close  to  the  eigenvalues  of  LLlL1  and  if  so,  when?  Note  that  the  answer  is  always 
in  the  affirmative  when  =  /,  irrespective  of  the  individual  entries  in  L. 

By  Eisenstat  and  Ipsen’s  Theorem  2.1  in  [49],  the  eigenvalues  of  Dj AD\  are  small 
relative  perturbations  of  the  eigenvalues  of  A  if  ||-Di||  ~  1.  By  applying  their  result  to  our 
case,  we  get 

-  X^T  +  6T^  ^  X1[LD2^Lt}-\\D1\\2,  (5.2.5) 

where  Aj[A]  denotes  the  jth  eigenvalue  of  A. 

We  now  write  D2  as  /  +  A2  where  || A2 1|  =  0(e).  In  proving  the  following  result, 
we  use  the  fact  that  an  eigenvalue  of  a  matrix  is  a  continuous  function  of  its  entries. 

Theorem  5.2.1  Let  A  =  A j  be  a  simple  eigenvalue  of  LLlLT  with  eigenvector  v  =  Vj.  Let 
A  +  <5 A  be  the  corresponding  eigenvector  of  L(L  A  A2 )0(/  +  A2)LT .  Then 

SX  =  vTL(A2tt  +  ttAl)LTv  +  0(\\A2\\2 -W^vf),  (5.2.6) 


|£A|  \vtLLtv\  IIA  „  /  \vtLLtv\  iia  . 

~  \vTm~LTv\  '2^  \\vTm~LTv\  '  ^ 

Proof.  By  continuity  of  the  eigenvalues,  we  can  write 


(5.2.7) 


L(L  +  A2)Xl(L  +  A2)L2  )  (v  +  Sv)  =  (X  A  SX)(v  A  Sv). 


Expanding  the  terms,  we  get 

(LA2niT  +  mA2iT  +  la2lla2lt)  ■  v  +  miT  ■  Sv+ 

(LA2LlLT  A  LLlA2LT  A  LA2LlA2LT )  ■  Sv  =  X Sv  +  SX  ■  v  A  SX  ■  Sv. 
Premultiplying  both  sides  by  uT, 

vT LA2LlLT v  A  vT LLlA2LT v  A  XvT Sv  A  0(\\Lt v\\2  ■  \\A2\\2)  =  XvtSvASX, 

and  by  canceling  the  common  term  on  both  sides,  we  obtain  (5.2.6).  Dividing  both  sides 
by  the  eigenvalue  A  and  taking  norms  yields  the  result  (5.2.7).  □ 

We  call  the  ratio  of  the  Rayleigh  Quotients  given  in  (5.2.7)  as  the  relative  condition 
number  of  Ay ,  and  denote  it  by  nrei ,  i.e. , 


^rel(^-j)  — 


\vjLLTv3\ 
\vj  LLlLTVj\ 

'  J  J  1 


(5.2.8) 
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By  combining  (5.2.5)  and  (5.2.7)  and  using  the  above  notation,  we  get 

{l-2\\I-D2\\.Krei(X1[T])}^l  <  Xj [T  +  6T] 

<  A  3[T]  IIAII2  {1  +  2  ||  I-  D2 1|  •  Krd{\3[T])}  ,  (5.2.9) 


which  is  correct  to  hrst-order  terms. 

We  draw  the  reader’s  attention  to  the  following  facts  : 

1.  Unlike  the  case  of  =  /  where  all  eigenvalues  are  relatively  robust  (see  Section  4.3), 
here  a  condition  number  measures  the  relative  robustness  of  an  eigenvalue. 

2.  The  relative  condition  number  Krei  is  different  for  each  eigenvalue.  Thus  some  eigen¬ 
values  may  be  determined  to  high  relative  accuracy  whereas  others  may  not  be.  See 
Example  5.2.2  for  such  an  example. 

3.  A  uniform  bound  for  nrei  is  ||T||  •  ||T-1||,  and  this  bound  holds  for  all  eigenvalues 
(see  (5.2.13)  below).  However  this  bound  can  be  a  severe  over-estimate  as  is  shown 
by  the  examples  given  later  in  Section  5.2.2. 

4.  Note  that  we  did  not  constrain  L  to  a  bidiagonal  form  in  deriving  (5.2.9).  However, 
relative  perturbations  in  the  individual  entries  of  a  dense  matrix  L  cannot  be  ex¬ 
pressed  as  D\LD2.  There  is  only  a  restricted  class  of  matrices  whose  componentwise 
perturbations  can  be  written  in  the  form  D\LD2.  Matrices  that  belong  to  this  class 
are  completely  characterized  by  the  sparsity  pattern  of  their  non-zeros  and  have  been 
called  the  set  of  biacyclic  matrices ,  a  term  coined  by  Demmel  and  Gragg  in  [33].  Be¬ 
sides  bidiagonal  matrices,  the  twisted  factors  introduced  in  Section  3.1  and  triangular 
factors  of  arrowhead  matrices  (see  [73,  p.175])  belong  to  this  class  of  matrices.  Thus 
the  theory  developed  in  this  section  is  applicable  not  only  to  a  bidiagonal  matrix  but 
also  to  any  biacyclic  matrix. 


By  defining  the  vector 


(5.2.10) 


the  relative  condition  number  defined  in  (5.2.8)  may  be  simply  written  as  a  norm, 


Krel(Xj)  =  fj  f j  =  \\f3f 


(5.2.11) 
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Note  that  the  following  relationships  hold, 

LTv3  =  fjyj\\j\,  LSlf )  =  VjsigniX^y/m, 

and  fjftfj  =  sign(Aj),  vj  Vj  =  1, 

and  so  we  call  fj,  the  jth  right  ^-singular  vector  of  L.  Since  LLlL1  Vj  =  A jvj,  and  by  (5.2.10), 
fj  may  alternately  be  expressed  as 

fj  =  sign(Aj)yiAj|  •  SIL-'vj.  (5.2.12) 

(5.2.10)  and  (5.2.12)  suggest  a  bound  for  fj  fj, 

Krei(Xi)  =  fj  fj  <  cond(X)  =  ||X||  •  ||X_1 1| ,  Vj.  (5.2.13) 

However  the  above  bound  is  rather  pessimistic  since  we  want  to  compute  the  small  eigen¬ 
values  of  LLlL1  to  high  relative  accuracy  when  the  matrix  is  nearly  singular,  i.e. ,  L  is 
ill-conditioned. 

In  [116],  Parlett  also  arrived  at  such  condition  numbers  for  a  bidiagonal  L  but  by 
using  calculus. 

Theorem  5.2.2  (Parlett  [116,  Thm.  2])  Let  L  be  a  bidiagonal  matrix  as  in  (4-3.3),  with 
a^  fi  0,  bk  fi  0  and  let  =  diag(a?i, . .  .,0Jn)  with  l>j~  =  ±1.  Let  (X,v)  denote  a  particular 
eigenpair  of  LLlLT  and  let  f  be  as  defined  in  (5.2.10).  Then  by  writing  A  =  sign(A)#2,  since 
X  fi  0, 

o/i  k  k  —  1  n  n 

(«)  fiTfi  ■  y  =  I]  <02  -  sign(A)  ^  ojjf(j)2  =  sign(A)  ^  umf(m)2  -  v( l? ^ 

^  i= 1  j  =  1  m  =  k  l=k-\- 1 

an  h  k  k  n  n 

(&)  JfiT'T  =  siSn(A)Zl^K02  -  XX-?)2  =  Z  «(^)2  -  sign(A)  Y,  ^ip(1)2- 
6hk  9  i=l  3  =  1  m  =  k+l  l=k+ 1 

For  more  work  of  a  similar  flavor,  see  [34]. 

5.2.2  Examples 

We  now  examine  relative  condition  numbers  of  some  factorizations 

t  -  pL  =  miT, 

where  T  is  tridiagonal  and  L  is  lower  bidiagonal.  In  particular,  we  show  that 
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i.  The  representation  of  To  at  1  is  an  RRR,  where  To  is  as  given  in  Example  5.1.1. 

ii.  The  representation  of  T\  at  1  is  a  partial  RRR,  where  T\  is  the  matrix  of  Example  5.1.2. 

iii.  A  representation  of  T  —  (j,I  at  an  arbitrary  shift  fj,  may  not  determine  any  eigenvalue 
to  high  relative  accuracy. 


Example  5.2.1  [An  RRR  (All  Eigenvalues  are  Relatively  Well-Conditioned).] 

Consider  the  LqDqL^  decomposition  where  Dq  and  To  are  as  in  Example  5.1.1.  The  corre¬ 
sponding  Cholesky-like  decomposition 


Lq^ILq  —  LqDqLq 


has 


diag(Lo)  = 


.6928203187797266 

.3891773192193351 

-.7494442574516461 

,  diag(L0,  — 1)  = 

.9435080382529064 

5.544821298610673  - 10“4 

4.983500289685748- 10“5 

1.409546469637835- 10“4 

while  diag(O)  =  (  —  1,1,1,!).  The  eigenvalues  of  LqXILq  are  approximately 


Xi  =  — 1,  A2  =  1.49  •  10“8,  A3  =  2.98  •  10“8,  A4  =  1, 


cond(T)  =  ||T||  •  ||T  1||  =  9.46  •  103. 
We  hnd  the  17-right  singular  vectors  fj  to  be 


1.01 

1 

oo 

bo 

H 

o 

1 

cn 

_ l 

1 

H 

to 

H 

o 

1 

d^ 

_ 1 

-.144 

-.144 

,  f2  = 

1 

CO 

H 

H 

o 

1 

d^ 

,  f3  = 

1 

o 

1 

,  and  f4  = 

1.01 

7.5  •  10-5 

.577 

.816 

5.2  •  10-4 

-5.3  •  10-13 

1 

oc 

l' 

_ 1 

.577 

1 

CO 

H 

o 

1 

to 

1 _ 

and  by  (5.2.11),  the  individual  relative  condition  numbers  are 

KreKAi)  =  1-042,  nrel(X2)  =  1.000,  nrel(X3)  =  1.000,  nrel(X4)  =  1.042. 


All  the  relative  condition  numbers  are  close  to  1  and  explain  our  success  in  computing 
orthogonal  eigenvectors  as  exhibited  in  Example  5.1.1.  □ 
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In  the  above  example,  we  found  the  bidiagonal  factorization  of  a  nearly  singular 
tridiagonal  to  be  an  RRR.  In  our  numerical  experience,  the  above  situation  appears  to  be 
typical.  However,  sometimes  the  factorization  may  determine  only  a  few  eigenvalues  to  high 
relative  accuracy.  Somewhat  surprisingly,  eigenvalues  that  are  small  in  magnitude  may  he 
determined  to  high  relative  accuracy  but  not  the  large  ones! 


Example  5.2.2  [Partial  RRR  (Only  Small  Eigenvalues  are  Well-Conditioned).] 

Consider  the  factors  L\  and  D\  given  in  Example  5.1.2.  The  corresponding  Cholesky-like 
decomposition 

Iifii  If  =  l1d1lI 


has 


1.05715987472128  - 

10-4 

6.68874025674659 

•  103 

6.68874025674659 

•  103 

,  diag(Li,  —  1)  = 

-1.05715987472128 

•  10-4 

2.11431974944257- 

10-4 

4.98349983747794- 

10-5 

1.40954648562579- 

10-4 

diag(Lr)  = 


and  diag(Oi)  =  (1,  —1, 1, 1).  The  eigenvalues  of  Lifl\Lj  are  approximately 


Xi  =  — 1,  A2  =  1.49  •  10“8,  A3  =  2.98  •  10“8,  A4  =  1, 


cond(Xi)  =  1 1 Xi 1 1  •  \\L1 1||  =  1.37  •  108. 
The  11-right  singular  vectors  for  the  two  small  eigenvalues  are 


.577 

1 - 

CO 

oc 

1 _ 

.577 

,  and  f3  = 

.816 

-.577 

-.816 

.816 

-.577 

and  the  corresponding  relative  condition  numbers  are 

Krei(  x2)  =  1.666,  Krei(  A3)  =  2.333. 

Thus  the  two  eigenvalues  that  are  small  in  magnitude  are  relatively  well  conditioned  but 
for  the  extreme  eigenvalues  we  have 


- 1 

1 

-I 

H 

o 

co 

_ 1 

1 

-I 

H 

O 

co 

_ 1 

-4.7  •  103 

,  and  {4  = 

4.7-  103 

1.1  •  10-4 

1.1  •  10“4 

1 - 

1 

1— 1 

o 

1 

co 

1 _ 

1 

1— 1 

O 

1 

co 

1 _ 
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Krel( Ai)  =  4.5 -107,  KrpJ(\4)  =  4.5  •  107! 

□ 

There  are  also  factorizations  where  none  of  the  eigenvalues  is  determined  to  high 
relative  accuracy. 

Example  5.2.3  [No  Eigenvalue  is  Relatively  Well-Conditioned.]  Let  the  matrix  To 
be  as  in  Example  5.1.1,  and  consider 

T0  -  1.075297677018139/  =  TOTt. 

We  have  intentionally  chosen  the  above  shift  to  be  very  close  to  an  eigenvalue  of  the  leading 
2x2  submatrix  of  Tq.  Consequently,  there  is  a  large  element  growth  in  this  factorization, 

-.6967821655658823 
1.819093322471946  x  106 
1.519032487587594 x  10“14 

with  diag(fi)  =  (  —  1, 1,  —1,  —1).  The  approximate  eigenvalues  of  TOTt  are, 

Ai  =  -1.0753,  A2  =  -.0752976,  A3  =  -.0752954,  A4  =  .92473, 

and 

cond(T)  =  ||T||  •  ||T_1||  =  2.46  •  1013, 

where  this  large  condition  number  is  primarily  due  to  the  large  norm  of  L.  The  relative 
condition  numbers  are  found  to  be 

KreKAi)  =  1.1  -  1011,  Krel( A2)  =  6.4  -  1012,  KrPj(\3)  =  6.8  •  107,  KrPj(\4)  =  6.5  -  1012, 

and  indicate  that  none  of  the  eigenvalues  is  determined  to  high  relative  accuracy.  Note  that 
none  of  the  eigenvalues  is  small  and  the  lack  of  relative  accuracy  implies  an  absence  of  any 
absolute  accuracy!  □ 

Other  examples  of  RRRs  may  be  found  in  Section  6  of  [116].  In  order  to  see  how 
the  relative  robustness  of  LDL T  representations  of  To  —  /u/  varies  with  /i,  see  Figures  C.l, 
C.3  and  C.3  in  Case  Study  C. 


.7451829782893468 

2.018543658277998 • 10“7  .  . 

diag(L)  ,  diag(L,  lj 

1.819093322471722 • 10s 

.2744041812115826 
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5.2.3  Factorizations  of  Nearly  Singular  Tridiagonals 


The  purpose  of  taking  multiple  representations  is  to  differentiate  between  the  in¬ 
dividual  eigenvalues  in  a  cluster.  It  is  crucial  that  the  “refined”  eigenvalues  of  the  represen¬ 
tation  based  near  the  cluster  have  modest  relative  condition  numbers,  which  were  defined 
in  (5.2.8).  In  this  section  we  indicate  why  most  triangular  factorizations  of  nearly  singular 
tri diagonals  are  relatively  robust,  at  least  for  the  small  eigenvalues. 

Since  we  do  not  have  a  complete  theory  as  yet  to  explain  all  our  numerical  exper¬ 
iments,  we  shall  present  some  conjectures  in  this  section  and  give  some  insight  into  why 
they  might  be  true. 

In  Section  5.1,  we  speculated  on  a  possible  correlation  between  the  element  growth 
when  forming 


t  -  fii  =  miT, 


and  the  relative  robustness  of  the  triangular  factorization.  In  all  our  numerical  experience, 
we  have  always  found  L  and  to  be  relatively  robust  whenever  there  is  no  element  growth. 
Thus  we  believe  the  following  conjecture. 


Conjecture  1  If 


I  (LLT)u  | 

(inLT)u 


0(1),  for  1  <  i  <  n, 


(5.2.14) 


then  L  and  fl  form  a  relatively  robust  representation  (RRR),  where  an  RRR  is  as  defined 
in  Section  5.2. 


Note  that 

\{LLT)n\  =  \ejLLTet\ 

\(LnLT)ii\  \efmLTet\' 

whereas 

\vJLLtVj\ 

Krc,{X,)  =  \vjLniTv,\' 

The  quantity  on  the  left  hand  side  of  (5.2.14)  is  a  measure  of  the  element  growth  in  form¬ 
ing  LIILT .  The  above  conjecture  speculates  that  nrei(\j)  is  0(1)  for  all  A  j  whenever  (5.2.14) 
is  satisfied. 

Note:  All  the  conjectures  presented  here  should  be  taken  “in  spirit  only”,  and  not  as  precise 
mathematical  conjectures.  For  example,  the  condition  (5.2.14)  might  not  be  exactly  correct, 
but  a  measure  of  element  growth  should  be  0(1)  to  guarantee  an  RRR. 
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Even  if  we  get  a  large  element  growth,  the  eigenvalues  of  interest  may  still  be 
determined  to  high  relative  accuracy.  We  saw  one  such  occurrence  in  Example  5.1.2.  Again, 
extensive  numerical  testing  leads  us  to  believe  the  following. 

Conjecture  2  Suppose  that  LSlLT  is  “ nearly ”  singular.  Then  the  eigenvalue  of  LStLT  that 
is  smallest  in  magnitude  is  always  determined  to  high  relative  accuracy  with  respect  to  small 
relative  changes  in  entries  of  L. 

We  consider  an  extreme  example  in  support  of  the  above  conjecture.  Suppose  LSlLT 
is  a  singular  matrix.  Since 

det(L£lLT)  =  ±det(22), 

singularity  of  LSlLT  implies  that  a  diagonal  element  of  L  must  be  zero.  Relative  changes 
in  the  individual  entries  of  such  an  L  keep  the  matrix  singular  no  matter  how  large  the 
non-zero  elements  of  L  are  (since  a  relative  perturbation  of  a  zero  entry  keeps  it  zero). 
Thus  we  may  expect  that  when  LSlLT  is  close  to  singularity,  its  smallest  eigenvalue  will  not 
change  appreciably  due  to  small  componentwise  changes  in  L. 

Note  that  in  the  above  conjecture,  we  have  not  specified  how  close  LSlLT  should 
be  to  a  singular  matrix.  We  believe  that  the  exact  condition  will  be  a  natural  outcome  of 
a  proof  of  the  conjecture,  if  true. 

Although,  Conjecture  2  is  interesting  in  its  own  right,  it  is  insufficient  for  our 
purposes.  We  want  all  locally  small  eigenvalues,  and  not  just  the  smallest,  to  be  determined 
to  high  relative  accuracy.  An  example  sheds  some  light  on  the  possible  scenarios. 


Example  5.2.4  [2nd  Smallest  Eigenvalue  should  be  Relatively  Well-Conditioned.] 

Consider  Wilkinson’s  matrix  [136,  p.308], 


W 21 


10  1  0 

1  9  1 

1  8  . 

.  8  1 

1  9  1 

0  1  10 


(5.2.15) 


125 


which  has  pairs  of  eigenvalues  that  are  close  to  varying  degrees.  For  example,  the  eigenvalues 

Aw  =  7.003951798616376, 

Ais  =  7.003952209528675, 

agree  in  6  leading  digits.  We  might  try  and  form  either 


IF+-W  =  LiDiLj, 

(5.2.16) 

W+-\15I  =  l2d2lT2 

(5.2.17) 

in  order  to  compute  orthogonal  approximations  to  v\4  and  v\^.  We  encounter  a  large  element 
growth  when  forming  (5.2.16)  but  not  in  (5.2.17).  Consistent  with  Conjectures  1  and  2, 
we  find  L2D2L2  to  be  an  RRR  whereas  LiD\L^  only  determines  its  smallest  eigenvalue  to 
high  relative  accuracy.  In  particular, 

Krel(\14[L1D1Lj})  =  2.246,  Krel(\15[L1D1Lj})  =  7.59  X  108, 

while 

^rel(Xi4[L2D2L2])  =  1.0,  Kre^Alsfi^i^C^])  =  1.0. 

Thus  L1D1L1  is  inadequate  for  our  purposes  whereas  L2D2L2  enables  us  to  compute  V14 
and  his  that  are  numerically  orthogonal.  □ 

Finally,  as  in  the  above  example,  we  believe  we  can  always  find  a  partial  RRR 
based  at  one  of  the  eigenvalues  in  the  cluster. 

Conjecture  3  Suppose  T  is  a  tridiagonal  matrix  and  Ay ,  Aj+i, . . . ,  Aj+m_i  are  approxima¬ 
tions  to  m  “close”  eigenvalues  of  T ,  that  agree  in  almost  all  their  digits.  Then  at  least  one 
of  the  factorizations 

T  -  XSI  =  LsDsLts  ,  j  <  s  <  j  +  to, 

is  a  partial  RRR(j,  j  -\-l, ..  .,j-\-m—l),  i.e.,  LSDSL J  determines  its  locally  small  eigenvalues 
to  high  relative  accuracy. 

The  relative  robustness  of  factorizations  of  nearly  singular  tridiagonals  is  consistent 
with  our  earlier  Conjecture  1.  This  is  because  we  do  not  get  any  element  growth  in  the 
generic  case  when  forming 

T-XI  =  LDLt , 


126 


where  A  is  close  to  an  eigenvalue  of  T. 

Conjecture  3  is  exactly  what  we  need  to  achieve  our  goal  of  computing  orthog¬ 
onal  eigenvectors.  Furthermore,  it  would  be  even  more  efficient  if  we  could  easily  find  the 
particular  As  in  Conjecture  3  that  leads  to  a  relatively  robust  representation. 

We  leave  it  for  future  studies  to  provide  further  numerical  evidence  and  theory  to 
prove  or  disprove  the  validity  of  the  conjectures  made  in  this  section.  From  now  on,  we  will 
assume  that  Conjecture  3  is  true  so  that  we  can  find  the  desired  RRRs. 

5.2.4  Other  RRRs 

Until  now,  we  have  only  considered  computing  a  triangular  decomposition,  LDL T 
or  LDL1 ,  as  an  RRR.  However,  we  could  also  consider  the  twisted  factorizations  introduced 
in  Section  3.1  as  RRRs.  Indeed,  since  the  set  of  n  possible  twisted  factorizations 

T-fil  =  N^AN^T,  1  <  k  <  n, 

includes  the  LDL T  decomposition,  it  may  be  easier  to  form  such  RRRs.  Besides,  it  is 
conceivable  that  a  twisted  factorization  may  be  qualitatively  better  than  a  triangular  de¬ 
composition  in  terms  of  its  relative  perturbation  properties.  We  believe  that  all  conjectures 
of  the  previous  section  also  hold  for  twisted  factorizations,  and  not  only  for  triangular 
factorizations. 

5.3  Orthogonality  using  Multiple  Representations 

The  examples  of  the  previous  section  indicate  that  eigenvectors  computed  using  a 
single  RRR  turn  out  to  be  numerically  orthogonal.  If  relative  gaps  are  small,  we  form  an 
RRR  near  the  cluster  to  get  locally  small  eigenvalues  that  have  large  relative  gaps.  Some¬ 
times,  as  in  Example  5.1.2,  we  can  only  obtain  a  partial  RRR  the  use  of  which  guarantees 
orthogonality  of  a  limited  set  of  eigenvector  approximations.  In  such  a  case,  the  remaining 
eigenvectors  need  to  be  computed  using  a  different  RRR.  In  this  section,  we  explain  why  the 
approximations  computed  from  one  RRR  are  numerically  orthogonal  to  the  approximations 
computed  using  another  RRR.  An  added  benefit  of  the  representation  trees  that  we  will 
introduce  for  our  exposition  is  that  they  summarize  the  computations  involved  and  help  us 
in  identifying  computationally  efficient  strategies. 
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5.3.1  Representation  Trees 

In  Section  4.5,  we  provided  a  detailed  proof  of  orthogonality  of  the  eigenvectors 
computed  by  Algorithm  X  using  the  LDL T  decomposition  of  a  positive  definite  tridiagonal 
matrix  when  the  eigenvalues  have  large  relative  gaps.  This  proof  can  easily  be  generalized 
to  the  case  when  any  fixed  RRR  is  used  for  the  computation.  Similarly  we  can  furnish 
a  detailed  proof  of  orthogonality  when  different  RRRs  are  used  for  computing  different 
eigenvectors.  However,  to  avoid  being  drowned  in  detail,  we  choose  an  alternate  approach. 
We  now  introduce  representation  trees  that  informally  indicate  why  vectors  computed  using 
one  RRR  are  orthogonal  to  those  computed  from  another  RRR.  A  formal  treatment  and 
greater  level  of  detail  as  in  Corollary  4.5.4  can  also  be  reproduced  by  mimicking  its  proof. 

We  remind  the  reader  of  our  indexing  convention  for  eigenvalues,  by  which 


^1  f  '  •  •  <  Xn. 


A  representation  tree  consists  of  nodes  that  are  denoted  by  ( R ,  /)  where  R  stands 
for  an  RRR  and  /  is  a  subset  of  the  index  set  {1,2 , ,n}.  Informally,  each  node  denotes 
an  RRR  that  is  used  enroute  to  computing  eigenvectors  of  the  eigenvalues  A;  [if]  indexed 
by  /,  i.e. ,  i  £  I  (note  that  we  have  used  R  in  two  ways  —  to  denote  the  RRR  and  also  for 
the  underlying  matrix  defined  by  the  RRR).  A  parent  node  ( RP,IP )  can  have  m  children 
( Rj,Ij ),  1  <  j  <  m,  such  that 

Uj-^j  =  Ip- 

Any  pair  of  the  child  nodes  {R\,  /i),  (R2, I 2)  must  satisfy 

h  c  iP,  h  c  iv,  h  n  h  =  <t>,  (5.3.18) 

and  relgap2(A8[i?p],  Xj[Rp])  >  l/n,  Mi  G  h,j  G  h,  i  ±  j,  (5.3.19) 

where  relgap2  is  as  defined  in  Section  4.3.  Nodes  that  have  no  children  will  be  called  leaf 
nodes  while  all  other  nodes  are  called  intermediate  nodes.  The  index  set  associated  with  any 

leaf  node  must  be  a  singleton.  Each  edge  connecting  a  parent  node  to  a  child  node  will  be 

labeled  by  a  real  number  p.  Informally,  an  edge  denotes  the  action  of  forming  an  RRR  by 
applying  a  shift  of  p  to  the  matrix  denoted  by  the  parent  RRR.  Additionally,  the  shifts  used 
to  form  the  leaf  representations  must  be  very  accurate  approximations  to  the  appropriate 
eigenvalues  of  the  leaf’s  parent  RRR.  The  condition  on  the  relative  gaps  given  by  (5.3.19) 
ensures  that  different  RRRs  are  formed  in  order  to  compute  eigenvectors  when  relative  gaps 
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are  smaller  than  1/n.  Note  that  we  choose  1/n  as  the  cut-off  point  for  relative  gaps  since 
this  translates  into  an  acceptable  bound  of  0{ne)  for  the  dot  products,  see  Corollary  4.5.4 
for  details.  The  above  conditions  should  become  clearer  when  we  examine  some  example 
representation  trees. 

In  all  the  representation  trees  we  consider  in  this  thesis,  the  RRRs  associated  with 
intermediate  nodes  will  be  bidiagonal  factors  of  either  the  given  tridiagonal  T  or  its  translate. 
Each  RRR  must  determine  to  high  relative  accuracy  all  the  eigenvalues  associated  with  its 
index  set.  However,  the  representations  associated  with  leaf  nodes  are,  in  general,  twisted 
factorizations.  As  we  saw  in  Section  4.5,  relative  robustness  of  these  twisted  factorizations  is 
not  necessary  for  the  orthogonality  of  the  computed  eigenvectors.  However,  we  have  always 
found  such  a  twisted  factorization  to  determine  its  smallest  eigenvalue  to  high  relative 
accuracy,  see  Conjecture  2  and  Section  5.2.4.  Recall  that  the  eigenvector  approximations  are 
formed  by  multiplying  elements  of  these  twisted  factorizations,  see  Step  (4d)  of  Algorithm  X. 

A  representation  tree  denotes  the  particular  computations  involved  in  computing 
orthogonal  eigenvectors.  Since  we  are  concerned  about  forming  representations  that  are 
relatively  robust,  it  is  crucial  that  we  use  the  differential  transformations  of  Section  4.4.1 
when  forming 

T  T 

LpDpLp  fil  —  LiDiL1  , 

so  that  we  can  relate  the  computed  decomposition  to  a  small  componentwise  perturbation 
of  Lp  and  Dp. 

Example  5.3.1  [A  Representation  Tree.]  The  RRR  tree  in  Figure  5.1  summarizes  a 
computation  of  the  eigenvectors  of  the  matrix  To  presented  in  Example  5.1.1.  Figure  5.1, 
where  ovals  denote  intermediate  representations  while  rectangles  stand  for  leaf  nodes,  re¬ 
flects  the  following  information.  The  initial  decomposition 


To  —  LpDpLp 


has  eigenvalues 


1  +  a /s,  A3  ~  1  +  2y/e, 


2. 


The  extreme  eigenvectors  are  computed  using  the  twisted  factorizations 


LpDpLt  -  A XI  = 


r(i)3 


LpDpLt  -  A 4/  =  xi3)A4iV; 


r(3)3 
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Figure  5.1:  Representation  Tree  —  Forming  an  extra  RRR  based  at  1 

where  |A;  —  A;|  =  0(e|A8|)  and  the  superscript  r  in  N jr^  denotes  the  twist  position.  The 
intermediate  representation 

LpDpLp  —  I  =  LqDqLq 

is  needed  since  relgap2(A2,  A3)  =  0(^/e)  <  1  j \fn.  The  two  smallest  eigenvalues  of  LqDqL^ 
are  computed  to  be 

8 A2  ~  s/e,  and  8 A3  ~  y/2e. 

The  corresponding  eigenvectors  may  now  be  computed  as 

LoD0Lq  -  6X2I  =  !V2(4)A2!V2(4)T,  L0D0L^  -  S\3I  =  N^A3N^)T. 

As  we  mentioned  earlier,  all  the  factorizations  in  the  representation  tree  should  be  formed 
by  the  differential  transformations  of  Section  4.4.1.  □ 

Example  5.3.2  [A  Better  Representation  Tree.]  From  Example  5.2.1,  we  know  that 
the  LoDoLq  decomposition  of  To  —  /  determines  all  its  eigenvalues  to  high  relative  accuracy. 
Thus  the  scheme  given  by  the  representation  tree  of  Figure  5.2  can  alternatively  be  used 
to  compute  orthogonal  eigenvectors.  The  bulk  of  the  work  lies  in  computing  the  relevant 
subset  of  eigenvalues  of  the  intermediate  RRRs.  Thus  the  representation  tree  of  Figure  5.2 
yields  a  more  efficient  computation  scheme  than  Figure  5.1.  □ 
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Figure  5.2:  Representation  Tree  —  Only  using  the  RRR  based  at  1 

Example  5.3.3  [Only  One  Representation  Tree  is  Possible.]  Orthogonal  eigenvec¬ 
tors  of  the  matrix  T\  given  in  Example  5.1.2  may  be  computed  as  shown  by  the  represen¬ 
tation  tree  of  Figure  5.3.  By  Example  5.2.2,  LiD\L^  is  only  a  partial  RRR,  and  cannot  be 
used  to  compute  the  extreme  eigenvectors.  Figure  C.3  in  Case  Study  C  shows  that  there 
is  no  representation  based  near  1  that  is  relatively  robust  for  all  its  eigenvalues  and  hence, 
there  is  no  representation  tree  for  T\  that  looks  like  Figure  5.2.  □ 

We  now  indicate  why  the  computed  eigenvectors  are  numerically  orthogonal  when 
the  particular  computation  scheme  is  described  by  a  representation  tree.  We  re-emphasize 
the  following  facts  that  are  important  for  showing  orthogonality. 

i.  Each  intermediate  node  is  a  partial  RRR  for  the  eigenvalues  associated  with  its  index 
set. 

ii.  Each  node,  or  representation,  is  formed  by  the  differential  transformations  given  in 
Section  4.4.1. 

iii.  The  approximation  used  to  form  a  leaf  representation  agrees  in  almost  all  its  digits 
with  the  relevant  eigenvalue  of  its  parent  node  (as  given  by  the  index  of  the  leaf  node). 

iv.  The  eigenvector  approximation  is  formed  solely  by  multiplications  of  elements  of  the 
twisted  factorizations  that  are  represented  by  the  leaf  nodes. 

v.  Whenever  the  relative  gap  between  eigenvalues  of  a  representation  is  smaller  than 
1/n,  a  new  representation  that  is  relatively  robust  for  its  locally  small  eigenvalues  is 
formed. 
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Figure  5.3:  Representation  Tree  —  An  extra  RRR  based  at  1  is  essential 

The  first  four  facts  outlined  above  can  be  used,  as  in  Section  4.5,  to  prove  the 
orthogonality  of  eigenvector  approximations  computed  from  leaf  representations  that  have 
a  common  parent.  Note  that  the  above  statement  is  analogous  to  saying  that  the  vectors 
computed  when  Steps  3  and  4  of  Algorithm  X  are  applied  to  an  RRR  can  be  shown  to  be 
orthogonal.  Recall  that  in  Section  4.5,  orthogonality  is  proved  by  relating  each  computed 
vector  to  an  exact  eigenvector  of  the  matrix  defined  by  the  parent  RRR. 

Representation  trees  come  in  handy  in  seeing  why  the  computed  vectors  are  or¬ 
thogonal  when  more  than  one  intermediate  RRR  is  used.  Let  us  consider  Figure  5.1.  The 
computed  vectors  h2  and  h 3  are  orthogonal  for  the  reasons  given  in  the  above  paragraph. 
But  why  is  h2  orthogonal  to  hi?  Note  that  hi  is  computed  using 

LpDpLtv  -  Ai /  =  JV1(1)Ai?V1(1)T, 

while  h2  is  computed  from 

L0D0Ll-6\2I  =  A2(4)A2iV2(4)T. 

However, 

•  both  LpDpLp  and  LqDqLq  are  RRRs; 
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•  LoDoLq  is  a  translate  of  LpDpLp  and  is  computed  by  the  differential  dstqds  transfor¬ 
mation  of  Section  4.4.1,  and  the  roundoff  error  analysis  given  in  Theorem  4.4.2  shows 
that  an  exact  relation  exists  between  small  componentwise  perturbations  of  Lp,  Dp 
and  T0,  -Do! 

•  the  relative  gap  between  Ai  and  A2  is  large. 

The  vector  v\  can  directly  be  shown  to  be  close  to  the  first  eigenvector  of  LpDpLp  , 
while  f>2  can  be  similarly  related  to  the  second  eigenvector  of  LpDpLp  but  via  LqDqL^ .  The 
relative  robustness  of  both  these  representations  along  with  the  above  mentioned  facts  can 
now  be  used  to  prove  that  hi  and  V2  are  orthogonal. 

In  general,  any  two  eigenvectors  computed  from  the  twisted  factorizations  repre¬ 
sented  by  the  leaf  nodes  of  a  representation  tree  will  be  “good”  approximations  to  distinct 
eigenvectors  of  a  common  ancestor  RRR.  The  properties  satisfied  by  a  representation  tree 
now  imply  that  these  computed  vectors  must  be  orthogonal.  Note  that  the  detailed  bounds 
on  the  dot  products  of  the  computed  vectors  can  get  rather  messy  and  would  involve  the 
number  of  intermediate  representations  employed  in  computing  an  eigenvector.  On  the 
other  hand,  representation  trees  provide  a  visual  tool  that  makes  it  easy  to  see  why  the 
computed  vectors  are  nearly  orthogonal.  Besides  this  use,  representation  trees  also  allow 
us  to  clearly  see  which  computations  are  more  efficient. 

5.4  Algorithm  Y  —  orthogonality  even  when  relative  gaps 
are  small 

We  now  present  an  algorithm  that  also  handles  small  relative  gaps  but  still  per¬ 
forms  the  computation  in  0(n2)  time. 

Algorithm  Y  [Computes  orthogonal  eigenvectors.] 

1.  Find  fj,  <  ||T||  such  that  T  -\-  fil  is  positive  (or  negative)  definite. 

2.  Compute  T  +  fil  =  LpDpL J. 

3.  Compute  the  eigenvalues,  Ay ,  of  LpDpLp  to  high  relative  accuracy  by  the  dqds  algo¬ 
rithm  [56]. 


4. 
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5.  Group  the  computed  eigenvalues  A /,...,  Am  into  the  categories: 
isolated.  A j  is  isolated  if 

min(relgap2(Aj,Aj+i),relgap2(Aj_i,Aj))  >  1/n. 
clustered.  A  j, . . \j+k-i  form  a  “cluster”  of  k  eigenvalues  if 

relgap2(A„  Xt+1)  <  1/n,  j<i<j  +  k-l, 
while  relgap2(Aj_i,  Aj)  >  1/n,  and  relgap2(Aj+fc-i,  Xj+k)  >  1/n. 

6.  For  each  isolated  eigenvalue,  A  =  A  j,l  <  j  <  to,  do  the  following 

(a)  Compute  LpDpLp  —  A I  =  L+D+L^_  by  the  dstqds  transform  (Algorithm  4.4.3). 

(b)  Compute  LpDpLp  —  A I  =  U-D_UZ  by  the  dqds  transform  (Algorithm  4.4.5). 

(c)  Compute  as  in  the  top  formula  of  (4.4.26).  Pick  r  such  that  |yr|  =  min^  \~fk\- 

(d)  Form  the  approximate  eigenvector  zj  =  by  solving  NrDrNj Zj  =  yrer  (see 
Theorem  3.2.2): 

zj(r)  = 

zj{i)  =  -L+(i)  ■  Zj{i  +  1),  i  =  r  -  1, . . .,  1, 

Zj(l  +  1)  =  —ZJ-{i)-z3{i),  l  =  r,  1. 

7.  For  each  cluster  A  j,  •  •  • ,  \j+k-i  do  the  following. 

(a)  Get  a  partial  RRR(j, . . . ,  j  +  k  —  1)  by  forming  the  dstqds  transformation 

LpDpLp  —  A  SI  =  LsDsLts, 

where  j  <  s  <  j  +  k  —  1. 

(b)  Compute  the  jth  through  (j  +  k  —  l)th  eigenvalues  of  LSDSL J  to  high  relative 
accuracy  and  call  them  8Xj, . . . ,  SXj+k-i- 

(c)  Set  /  j,  to  i  +  k  —  1,  Aj-  SXi  for  j  <  i  <  j  -\-  k  —  1,  Lp  Ls,  Dp  Ds , 
and  go  to  Step  5. 

□ 

The  previous  section  indicates  why  the  vectors  computed  by  Algorithm  Y  are 
numerically  orthogonal.  We  now  present  detailed  numerical  results  comparing  a  computer 
implementation  of  Algorithm  Y  with  existing  software  routines. 
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Chapter  6 

A  Computer  Implementation 


In  this  chapter,  we  give  detailed  timing  and  accuracy  results  of  a  computer  imple¬ 
mentation  of  Algorithm  Y,  whose  pseudocode  was  given  in  Section  5.4.  The  only  uncertainty 
in  implementing  this  algorithm  is  in  its  Step  (7a),  where  we  need  to  choose  a  shift  fj,  near 
a  cluster  in  order  to  form  the  relatively  robust  representation 

LpDpLp  —  fil  =  LDLT .  (6.0.1) 

We  briefly  discuss  our  strategy  in  Section  6.1.  Having  found  a  suitable  representation,  we 
need  to  find  its  locally  small  eigenvalues  to  high  relative  accuracy  in  Step  (7b)  of  Algo¬ 
rithm  Y.  In  Section  6.2,  we  give  an  efficient  scheme  to  find  these  small  eigenvalues  to  the 
desired  accuracy.  Note  that  the  earlier  steps  of  Algorithm  Y  have  been  discussed  in  great 
detail  in  Chapter  4. 

Finally,  in  Section  6.4,  we  give  detailed  timing  and  accuracy  results  comparing  our 
computer  implementation  of  Algorithm  Y  with  existing  LAPACK  and  EISPACK  software. 
Our  test-bed  contains  a  variety  of  tridiagonal  matrices,  some  from  quantum  chemistry 
applications,  that  highlight  the  sensitivity  of  earlier  algorithms  to  different  distributions  of 
eigenvalues.  The  test  matrices  are  discussed  in  Section  6.4.1. 

We  find  that  our  implementation  of  Algorithm  Y  is  uniformly  faster  than  earlier 
implementations  of  inverse  iteration,  while  still  being  accurate.  This  speed  is  by  virtue 
of  the  0(n2)  running  time  of  Algorithm  Y  as  opposed  to  the  earlier  algorithms  that  take 
0(n3)  time  in  general.  We  want  to  stress  that  the  results  presented  in  the  chapter  are 
preliminary  —  we  can  envisage  more  algorithmic  enhancements  and  a  better  use  of  the 
memory  hierachy  and  architectural  design  (such  as  multiple  functional  units  in  the  CPU) 
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of  modern  computers  to  further  speed  up  our  computer  implementation.  Some  of  these 
enhancements  are  briefly  discussed  in  Section  6.5,  and  we  hope  to  incorporate  such  code 
improvements  in  the  near  future. 

6.1  Forming  an  RRR 

Two  questions,  that  need  to  be  resolved  to  get  a  computer  implementation  of 
Step  (7a)  of  Algorithm  Y,  were  raised  in  Sections  5.2.1  and  5.2.3: 

1.  What  shift  p  near  a  cluster  should  we  choose  so  that  the  LDL T  decomposition 
of  (6.0.1)  is  an  RRR? 

2.  Given  LDLT ,  how  can  we  cheaply  decide  if  it  is  an  RRR? 

As  we  saw  earlier  in  Example  5.2.3,  not  every  p  in  (6.0.1)  leads  to  an  RRR.  We 
do  not  have  any  a  priori  way  of  knowing  whether  an  arbitrary  choice  of  p  would  lead  to 
a  desired  RRR.  Indeed,  as  Example  5.2.4  suggests,  there  may  not  be  any  alternative  other 
than  actually  forming  LDL T  at  a  judicious  choice  of  p  and  then,  a  posteriori ,  checking 
whether  this  decomposition  forms  an  RRR.  Since  we  want  an  efficient  procedure  for  the 
latter  purpose,  we  cannot  afford  to  evaluate  the  relative  condition  numbers  of  Section  5.2.1. 
Thus,  answers  to  the  two  questions  posed  above  are  crucial  to  a  correct  implementation. 

We  made  several  conjectures  in  Section  5.2.3  that  attempt  to  answer  these  ques¬ 
tions.  In  our  computer  implementation,  we  have  made  the  following  decisions  which  reflect 
our  belief  in  these  conjectures. 

1.  After  identifying  a  cluster  of  eigenvalues  Ay ,  Aj+r,  •  •  • ,  Xj+k-h  we  restrict  our  search 
for  an  RRR  to  a  factorization  based  at  one  of  these  A’s,  i.e. ,  to 

LpDpLp  -  A  J  =  LsDsLts  ,  j  <  s  <  j  +  k  -  1.  (6.1.2) 

This  is  the  strategy  given  in  Step  (7a)  of  Algorithm  Y  and  is  consistent  with  Conjec¬ 
ture  3. 

2.  When  trying  to  form  the  desired  RRR  in  (6.1.2),  we  try  p  =  As  in  the  order  s  = 
j,j+l,...,j  +  k  —  l.  If  we  find  the  element  growth,  as  defined  in  (5.1.2),  at  some  As 
to  be  less  than  an  acceptable  tolerance,  say  200,  then  we  immediately  accept  LSDSL J 
as  the  desired  RRR.  By  this  strategy,  we  are  often  able  to  form  an  RRR  in  the  first 
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attempt,  i.e. ,  based  at  the  leftmost  eigenvalue  fi  =  Xj.  This  saves  us  the  extra  work 
of  forming  all  possible  k  factorizations  of  (6.1.2).  Note  that  this  approach  reflects  our 
belief  in  Conjecture  1. 

3.  If  all  the  above  choices  of  n  lead  to  element  growths  bigger  than  200,  as  in  Exam¬ 
ple  5.2.2  (see  also  Case  Study  C),  we  choose  the  LSDSL J  decomposition  that  leads  to 
the  least  element  growth  as  our  partial  RRR. 

We  want  to  emphasize  that  even  though  we  cannot  prove  the  correctness  of  the 
above  decisions  as  yet,  our  computer  implementation  gives  accurate  answers  on  all  our  tests. 
We  have  tested  our  implementation  on  tridiagonal  matrices  that  are  quite  varied  in  their 
eigenvalue  distributions.  See  Section  6.4.1  for  details. 

6.2  Computing  the  Locally  Small  Eigenvalues 

Until  now,  we  have  not  discussed  ways  of  efficiently  computing  the  eigenvalues  of 
a  relatively  robust  representation.  All  eigenvalues  of  the  LDL T  decomposition  of  a  positive 
definite  matrix  may  be  efficiently  found  by  the  dqds  algorithm,  and  this  is  the  method 
employed  in  Step  3  of  Algorithm  Y.  In  its  present  form,  the  dqds  algorithm  Ends  the 
eigenvalues  in  sequential  order,  from  the  smallest  to  the  largest,  and  always  operates  on  a 
positive  definite  matrix.  See  [56]  for  more  details.  The  main  difficulty  in  trying  to  employ 
the  dqds  algorithm  to  find  the  locally  small  eigenvalues  of  an  RRR  is  that  in  most  cases, 
the  RRR  will  be  the  factorization  of  an  indefinite  matrix.  It  is  not  known,  as  yet,  if  the 
dqds  algorithm  can  be  adapted  to  an  indefinite  case  and  hence  we  need  to  find  an  alternate 
method. 

One  means  of  finding  the  locally  small  eigenvalues  is  the  bisection  algorithm,  using 
any  of  the  differential  transformations  given  in  Section  4.4.1  as  the  inner  loop.  However, 
since  bisection  is  a  rather  slow  method,  it  could  become  the  dominant  part  of  the  computa¬ 
tion.  So  we  use  a  faster  scheme  which  is  a  slight  variant  of  the  Rayleigh  Quotient  Iteration 
(RQI).  A  traditional  RQI  method  starts  with  some  (well-chosen)  vector  qo  and  progresses  by 
computing  Rayleigh  Quotients  to  get  increasingly  better  approximations  to  an  eigenvalue. 

Algorithm  6.2.1  [Traditional  RQI.] 

1.  Choose  a  vector  qo  (||^o||  =  1),  and  a  scalar  9q.  Set  i  0. 
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2.  Solve  (T  —  9il)xi+i  =  qi  for  Xi+ i. 

3.  Set  qi- |_i  a;8+i/||a;8'+i||,  9i+ 1  i  +  1,  and  repeat  Step  2.  □ 

As  shown  in  Corollary  3.2.1,  a  twisted  factorization  allows  us  to  cheaply  compute 
the  Rayleigh  Quotient  of  the  vector  z  where 

(T  —  9I)z  =  jrer,  z(r)  =  1,  (6.2.3) 

and  7r  is  an  element  of  the  twisted  factorization  at  twist  position  r.  It  is  immediately  seen 
from  (6.2.3)  that 

zt(T  —  9I)z  7r 

~z  =  ~z' 

As  discussed  earlier,  it  is  possible  to  choose  r  so  that  7r  is  proportional  to  the  distance 
of  9  from  an  eigenvalue  of  T.  The  index  r  where  |y r\  =  min^  1 7^ |  is  one  such  choice,  see 
Section  3.1  for  more  details.  Thus  we  get  the  fohowing  iteration  scheme. 

Algorithm  6.2.2  [RQI-Like  (Computes  an  eigenvalue  of  LSDSL^).] 

1.  Choose  a  scalar  9q.  Set  i  0. 

2.  Choose  an  index  r  as  follows  : 

(a)  Compute  LSDSL J  —9il  =  L+D+L^_  by  the  dstqds  transform  (Algorithm  4.4.3). 

(b)  Compute  LSDSL J  —  9 {I  =  U-D_U ^  by  the  dqds  transform  (Algorithm  4.4.5). 

(c)  Compute  7^  as  in  the  top  formula  of  (4.4.26).  Pick  r  such  that  I7 r\  =  min^  1 7^ | . 

3.  Solve  (LsDsLj  -  9J)zt  =  ^rer  as  follows 

Zi{r)  =  1, 

zt(p)  =  -L+{p)  ■  zt{p  +  1),  p  =  r  —  1, . . . ,  1, 

Zi(q+  1)  =  -U-(q)  ■  Zi(q),  q  =  r,...,n-  1. 

4.  Set  9i  =  7^  / 1 1 A 1 1 2  ?  i  i  +  1.  Repeat  Step  2.  □ 

In  the  above  RQI-like  scheme,  we  obtain  the  Rayleigh  Quotient  as  a  by-product  of 
computing  the  vector  Z{.  One  iteration  of  the  above  algorithm  is  only  2-3  times  as  expensive 
as  one  bisection  step,  but  convergence  is  at  least  quadratic.  Note  that  Step  3  given  above 
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differs  from  traditional  RQI  in  its  choice  of  er  as  the  right  hand  side  of  the  linear  system  to 
be  solved. 

As  outlined  in  Step  (7a)  of  Algorithm  Y  (see  Section  5.4),  the  representation 
LSDSL J  is  a  translate  of  the  original  matrix  LpDpLp  ,  i.e. , 

LpDpLp  —  [il  =  LSDSL^.  (6.2.4) 

If  A  is  an  eigenvalue  of  LpDpLp  ,  A  —  fi  is  the  corresponding  eigenvalue  of  LSDSL  J  if 
the  relation  (6.2.4)  holds  exactly.  However,  we  only  know  an  approximate  eigenvalue  A,  and 
roundoff  errors  are  inevitable  in  forming  Ls  and  Ds.  But,  even  though  A  —  fj,  is  not  an  exact 
eigenvalue  of  the  computed  LSDSL J ,  it  does  give  us  a  very  good  starting  approximation. 
As  a  result,  Oq  can  be  initialized  to  A  —  fj,  in  Step  1  of  our  RQI- like  scheme.  Of  course,  as 
emphasized  in  Chapter  5,  we  need  to  compute  each  small  eigenvalue  of  LSDSL J  to  high 
relative  accuracy.  Since  we  compute  both  forward  and  backward  pivots,  D+  and  _D_,  we 
can  include  the  safeguards  of  bisection  in  our  iteration. 

6.3  An  Enhancement  using  Submatrices 

In  this  section,  we  briefly  mention  a  novel  idea  that  facilitates  the  computation 
of  orthogonal  “eigenvectors”  of  eigenvalues  that  are  very  close  to  each  other  but  are  well- 
separated  from  the  rest  of  the  spectrum. 

Eigenvalues  of  a  tri diagonal  matrix  can  be  equal  only  if  an  off-diagonal  element 
is  zero.  In  such  a  case,  the  tridiagonal  matrix  is  a  direct  sum  of  smaller  tridiagonals, 
and  orthogonal  eigenvectors  are  trivially  obtained  from  the  eigenvectors  of  these  disjoint 
submatrices  by  padding  them  with  zeros.  However,  as  Wilkinson  observed,  eigenvalues  can 
be  arbitrarily  close  without  any  off-diagonal  element  being  small  [136].  It  turns  out  that 
even  in  such  a  case,  a  good  orthogonal  basis  of  the  invariant  subspace  can  be  computed  by 
using  suitable,  possibly  overlapping,  submatrices.  Thus  we  can  use  the  following  scheme: 

Algorithm  6.3.1  [Computes  orthogonal  “eigenvectors”  for  tight  clusters  using 
submatrices.] 

1.  For  each  of  the  “close”  eigenvalues  Ay , . . .,  A^  (that  are  well-separated  from  the  rest 
of  the  spectrum),  do  the  following: 

(a)  “Find”  a  submatrix  Tp:q  with  an  isolated  eigenvalue  A  which  is  “close”  to  the 
cluster  of  eigenvalues. 
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(b)  Compute  the  eigenvector  of  A,  i.e. ,  solve  ( Tp:q  —  A I)s  =  0  for  s. 

(c)  Output  the  vector  v  as  an  eigenvector,  where  vp:q  =  s  and  the  rest  of  v  is  padded 

with  zeroes.  □ 

For  some  of  the  theory  underlying  this  scheme,  the  reader  is  referred  to  Parlett  [112, 
115].  Clearly,  besides  the  existence  of  suitable  submatrices,  the  crucial  question  is:  how  do 
we  choose  the  submatrices  in  Step  (la)  of  the  above  scheme.  The  computation  of  the 
eigenvector  of  an  isolated  eigenvalue  in  Step  (lb)  is  easily  done  by  using  the  methods 
discussed  earlier. 

We  now  have  a  more  robust  way  of  picking  the  appropriate  submatrices  than  the 
approaches  outlined  in  [112,  115].  We  have  included  this  enhancement  in  our  implemen¬ 
tation  of  Algorithm  Y  and  found  it  to  work  accurately  in  practice.  Note  that  the  above 
algorithm  is  an  alternate  way  of  computing  orthogonal  eigenvectors  without  doing  any  ex¬ 
plicit  orthogonalization.  The  smaller  the  submatrix  sizes  in  Step  (lb)  above,  the  less  is  the 
work  required  to  produce  orthogonal  eigenvectors. 

It  is  beyond  the  scope  of  this  thesis  to  discuss  the  above  approach  in  greater  detail. 
The  theory  that  justifies  this  scheme  is  quite  involved  and  intricate,  and  a  cursory  treatment 
would  not  do  it  justice.  We  hope  to  present  more  details  in  the  near  future  [43]. 

6.4  Numerical  Results 

In  this  section,  we  present  a  numerical  comparison  between  Algorithm  Y  and  four 
other  software  routines  for  solving  the  symmetric  tridiagonal  eigenproblem  that  are  included 
in  the  EISPACK  [128]  and  LAPACK  [1]  libraries.  These  are 

1.  LAPACK  INVIT  :  The  LAPACK  implementation  of  bisection  and  inverse  iteration  [88, 
84,  87]  (subroutines  DSTEBZ  and  DSTEIN); 

2.  EISPACK  INVIT  :  The  EISPACK  implementation  of  inverse  iteration  after  Ending 
the  eigenvalues  by  bisection  [118]  (subroutine  DSTEBZ  from  LAPACK  followed  by 
TINVIT  from  EISPACK); 

3.  LAPACK  D&C  :  The  LAPACK  implementation  of  the  divide  and  conquer  method 
that  uses  a  rank-one  tear  to  subdivide  the  problem  [71,  124]  (subroutine  DSTEDC); 
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4.  LAPACK  QR  :  The  LAPACK  implementation  of  the  QR  algorithm  that  uses  Wilkin¬ 
son’s  shifts  to  compute  both  eigenvalues  and  eigenvectors  [69]  (subroutine  DSTEQR). 

6.4.1  Test  Matrices 

We  have  chosen  many  different  types  of  tri diagonals  as  our  test  matrices.  They 
differ  mainly  in  their  eigenvalue  distributions  which  highlight  the  sensitivity  of  the  above 
algorithms  as  discussed  in  Chapter  2.  Some  of  our  example  tri  diagonals  come  from  quantum 
chemistry  applications.  The  first  eleven  among  the  following  types  of  tri  diagonal  matrices 
are  obtained  by  Householder  reduction  of  random  dense  symmetric  matrices  that  have  the 
given  eigenvalue  distributions  (see  [36]  for  more  on  the  generation  of  such  matrices).  The 
matrix  sizes  for  our  tests  range  from  125-2000. 

1)  Uniform  Distribution  (e  apart),  n  —  1  eigenvalues  uniformly  distributed  from  e  to 

(n  —  l)e,  and  the  nth  eigenvalue  at  1,  i.e. , 

Aj  =  i  •  e,  i  =  1,  2, . . . ,  n  —  1,  and  Xn  =  1. 

These  matrices  are  identical  to  the  Type  1  matrices  of  Section  4.6. 

2)  Uniform  Distribution  (s/e  apart).  One  eigenvalue  at  e,  n  —  2  eigenvalues  uniformly 

distributed  from  1  +  y/e  to  1  +  (n  —  2)  s/e,  and  the  last  eigenvalue  at  2,  i.e., 

Ai  =  e,  Aj-  =  1  +  (i  —  1)  •  y/e,  i  =  2, . . . ,  n  —  1,  and  Xn  =  2. 

These  are  also  identical  to  the  Type  2  matrices  of  Section  4.6. 

3)  Uniform  Distribution  (e  to  1).  The  eigenvalues  are  equi-spaced  between  e  and  1, 

i.e., 

A  i  =  e  +  (i  —  1)  *  r,  i  =  l,2,...,n 
where  r  =  (1  —  e)/(n  —  1). 

4)  Uniform  Distribution  (e  to  1  with  random  signs).  Identical  to  the  above  type  of 

matrices  except  that  a  random  ±  sign  is  attached  to  the  eigenvalues. 

5)  Geometric  Distribution  (e  to  1).  The  eigenvalues  are  geometrically  arranged  be¬ 

tween  e  and  1,  i.e., 

A8  =  gU-0/U-1),  i  =  15  2, . . . ,  n. 
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6)  Geometric  Distribution  (e  to  1  with  random  signs).  Identical  to  the  above  type 

except  that  a  random  ±  sign  is  attached  to  the  eigenvalues. 

7)  Random.  The  eigenvalues  come  from  a  random,  normal  (0, 1)  distribution. 

8)  Clustered  at  1.  Ai  =  e,  and  A2  ~  A3  ~  •  •  •  ~  Xn  ~  1. 

9)  Clustered  at  ±1.  Identical  to  the  above  type  except  that  a  random  ±  sign  is  attached 

to  the  eigenvalues. 

10)  Clustered  at  e.  Ai  ~  A2  ~  ~  An_i  ~  e,  and  Xn  =  1. 

11)  Clustered  at  ±e.  Identical  to  the  above  type  of  matrices  except  that  a  random  ± 

sign  is  attached  to  the  eigenvalues. 

12)  (1,2,1)  Matrix.  These  are  the  Toeplitz  tri diagonal  matrices  with  2’s  on  the  diagonals 

and  l’s  as  the  off-diagonal  elements.  An  nxn  version  of  such  a  matrix  has  eigenvalues 
4  sin2[A;7r/2(n  +  1)],  and  for  the  values  of  n  under  consideration,  these  eigenvalues  are 
not  too  close. 

Matrices  of  types  3  through  11  above  are  LAPACK  test  matrices  and  are  used  to 
check  the  accuracy  of  all  LAPACK  software  for  the  symmetric  tridiagonal  eigenproblem.  In 
addition,  the  following  symmetric  tridiagonal  matrices  arise  in  certain  quantum  chemistry 
computations.  For  more  details  on  these  problems,  the  reader  is  referred  to  [10,  55]. 

13)  Biphenyl.  This  positive  definite  matrix  with  n  =  966  occurs  in  the  modeling  of 

biphenyl  using  Mpller-Plesset  theory.  Most  of  its  eigenvalues  are  quite  small  com¬ 
pared  to  the  norm.  See  Figure  6.1  for  a  plot  of  its  eigenvalue  distribution. 

14)  SiOSi6.  Density  functional  theory  methods  for  determining  bulk  properties  for  the 

molecule  SiOSie  lead  to  this  positive  definite  matrix  with  n  =  1687.  Most  of  the 
eigenvalues  are  quite  close  to  their  neighbors  and  there  is  no  obvious  subdivision  of 
the  spectrum  into  separate  clusters.  Figure  6.2  gives  this  distribution. 

15)  Zeolite  ZSM-5.  This  2053x2053  matrix  occurs  in  the  application  of  the  self-consistent 

held(SCF)  Hartree-Fock  method  for  solving  a  non-linear  Schrodinger  problem.  See 
Figure  6.3  for  the  spectrum. 
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Eigenvalue  Index 

Figure  6.1:  Eigenvalue  distribution  for  Biphenyl 


Eigenvalue  Index 

Figure  6.2:  Eigenvalue  distribution  for  SiOSie 


Eigenvalue  Index 


Figure  6.3:  Eigenvalue  distribution  for  Zeolite  ZSM-5 
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6.4.2  Timing  and  Accuracy  Results 

Tables  6.1  and  6.2  give  a  comparison  of  the  times  taken  by  the  different  algorithms 
to  find  all  the  eigenvalues  and  eigenvectors  of  symmetric  tridiagonal  matrices  of  the  type 
discussed  above.  All  the  numerical  experiments  presented  in  this  section  were  conducted 
on  an  IBM  RS/6000-590  processor  that  has  a  peak  rating  of  266  MFlops.  Fortran  BLAS, 
instead  of  those  from  the  machine  optimized  ESSL  library  [82],  were  used  in  this  preliminary 
testing1.  We  hope  to  include  the  ESSL  BLAS  in  our  future  comparisons.  The  rest  of  the 
tables,  Table  6.3  through  6.6,  indicate  the  accuracy  of  the  methods  tested. 

Our  Algorithm  Y  may  be  thought  of  as  an  alternate  way  of  doing  inverse  iteration. 
Thus  LAPACK  INVIT  and  EISPACK  INVIT  are  the  two  earlier  methods  that  most  closely 
resemble  Algorithm  Y.  Tables  6.1  and  6.2  show  that  Algorithm  Y  is  always  faster  than  both 
these  existing  implementations.  The  difference  in  speed  varies  according  to  the  eigenvalue 
distribution  —  on  matrices  of  order  2000,  Algorithm  Y  is  over  3500  times  faster  than 
LAPACK  INVIT  when  the  eigenvalues  are  clustered  around  e  while  it  is  4  times  faster 
when  the  eigenvalues  are  well-separated.  These  different  speedups  highlight  the  sensitivity 
of  the  various  algorithms  to  the  eigenvalue  distribution.  When  eigenvalues  are  clustered, 
EISPACK  and  LAPACK  inverse  iteration  need  0{n3)  time,  as  is  clear  from  Tables  6.1 
and  6.2.  On  the  other  hand,  they  take  0{n2)  time  when  eigenvalues  are  isolated,  see  the 
results  for  matrices  of  type  4  and  type  7  in  Table  6.1.  We  also  draw  the  reader’s  attention  to 
the  varying  behavior  on  matrices  with  uniformly  distributed  eigenvalues  (from  e  to  1),  and 
the  (1,  2, 1)  matrix.  For  small  n,  we  see  an  0{n2)  behavior  but  for  larger  n,  both  EISPACK 
and  LAPACK  inverse  iteration  take  0(n3)  time.  This  discrepancy  is  due  to  the  clustering 
criterion  which  is  independent  of  n  —  more  specifically,  reorthogonalization  is  done  when 
eigenvalues  differ  by  less  than  10-3||T||.  See  Section  2.8.1  for  more  details.  EISPACK’s 
implementation  is  always  faster  than  LAPACK’s  but  is  generally,  less  accurate.  In  fact,  the 
latter  was  designed  to  improve  the  accuracy  of  EISPACK  [87]. 

Algorithm  Y  is  much  less  sensitive  to  the  arrangement  of  eigenvalues  —  on  matrices 
of  order  2000,  the  time  taken  by  it  ranges  from  about  30  seconds  to  about  60  seconds  in 
most  cases.  The  two  notable  exceptions  are  the  matrices  where  almost  all  eigenvalues  are 
clustered  around  e  or  ±e.  It  turns  out  that  the  eigenvectors  of  such  matrices  can  be  very 
sparse  —  in  fact,  an  identity  matrix  is  a  good  approximation  to  an  eigenvector  matrix.  By 

1all  code  was  compiled  with  the  command  line  f77  -u  -O,  where  the  -O  compiler  option  is  identical  to 
the  -02  level  of  optimization 


144 


Time  Taken  (in  seconds) 

Matrix 

Matrix 

LAPACK 

EISPACK 

LAPACK 

LAPACK 

Algorithm 

Type 

Size 

INVIT 

INVIT 

D&C 

QR 

Y 

125 

0.72 

0.53 

0.02 

0.17 

0.14 

Uniform 

250 

3.52 

2.27 

0.10 

1.23 

0.52 

Distribution 

500 

19.78 

11.12 

0.42 

8.92 

1.96 

(e  apart) 

1000 

123.91 

60.24 

2.13 

68.06 

8.91 

2000 

858.01 

369.68 

13.18 

534.88 

31.87 

125 

0.55 

0.35 

0.14 

0.16 

0.40 

Uniform 

250 

2.87 

1.63 

0.69 

1.16 

1.54 

Distribution 

500 

17.76 

8.76 

4.09 

7.33 

5.94 

{y/e  apart) 

1000 

114.32 

50.94 

26.34 

63.35 

23.17 

2000 

825.91 

332.95 

178.74 

453.93 

92.26 

125 

0.53 

0.46 

0.16 

0.18 

0.14 

Uniform 

250 

2.06 

1.81 

0.74 

1.21 

0.56 

Distribution 

500 

8.08 

7.08 

4.27 

8.61 

1.96 

(e  to  1) 

1000 

124.65 

27.55 

26.85 

68.32 

8.04 

2000 

858.55 

370.35 

182.64 

535.94 

31.91 

Uniform 

125 

0.54 

0.47 

0.15 

0.18 

0.14 

Distribution 

250 

2.11 

1.85 

0.73 

1.19 

0.54 

(e  to  1  with 

500 

8.20 

7.18 

4.25 

8.35 

1.97 

random  signs) 

1000 

32.05 

27.81 

26.85 

61.58 

8.17 

2000 

127.53 

109.39 

183.26 

468.01 

32.30 

125 

0.69 

0.54 

0.05 

0.14 

0.13 

Geometric 

250 

3.29 

2.29 

0.25 

1.03 

0.48 

Distribution 

500 

17.72 

10.59 

1.22 

7.66 

1.86 

(e  to  1) 

1000 

108.90 

55.60 

6.70 

58.98 

7.25 

2000 

759.77 

335.68 

41.50 

442.96 

28.73 

Geometric 

125 

0.66 

0.50 

0.05 

0.15 

0.72 

Distribution 

250 

3.29 

2.29 

0.24 

0.96 

2.97 

(e  to  1  with 

500 

17.44 

10.37 

1.17 

5.96 

12.79 

random  signs) 

1000 

104.68 

53.95 

6.37 

37.34 

53.39 

2000 

711.83 

313.82 

38.18 

247.43 

191.82 

125 

0.53 

0.47 

0.14 

0.18 

0.21 

250 

2.07 

1.83 

0.61 

1.21 

0.99 

Random 

500 

8.20 

7.18 

3.54 

8.20 

5.43 

1000 

32.18 

27.81 

21.87 

60.75 

13.38 

2000 

132.16 

109.83 

147.50 

459.67 

61.92 

Table  6.1:  Timing  Results 
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Matrix 

Type 

Matrix 

Size 

Time  Taken  (in  seconds) 

LAPACK 

INVIT 

EISPACK 

INVIT 

LAPACK 

D&C 

LAPACK 

QR 

Algorithm 

Y 

125 

0.69 

0.47 

0.02 

0.14 

0.01 

Clustered 

250 

3.34 

2.10 

0.04 

0.98 

0.001 

3jt  £ 

500 

18.87 

10.19 

0.13 

7.22 

0.04 

1000 

120.34 

56.56 

0.42 

56.65 

0.09 

2000 

843.89 

353.89 

1.42 

432.27 

0.23 

125 

0.68 

0.49 

0.02 

0.14 

0.01 

Clustered 

250 

3.36 

2.21 

0.03 

1.03 

0.02 

at  ±e 

500 

18.98 

0.11 

7.26 

4.0 

0.03 

1000 

120.89 

57.09 

0.350 

54.68 

0.09 

2000 

846.84 

356.07 

1.18 

416.25 

0.25 

125 

0.30 

0.19 

0.01 

0.07 

0.04 

Clustered 

250 

3.54 

2.72 

0.04 

0.46 

0.15 

at  1 

500 

13.56 

12.66 

0.14 

3.96 

0.68 

1000 

100.04 

101.29 

0.39 

27.64 

3.77 

2000 

767.49 

952.61 

1.75 

223.66 

17.05 

125 

0.23 

0.14 

0.01 

0.09 

0.06 

Clustered 

250 

1.27 

0.74 

0.03 

0.60 

0.17 

at  ±1 

500 

7.76 

4.83 

0.12 

4.24 

0.70 

1000 

54.03 

36.17 

0.350 

29.98 

2.44 

2000 

398.60 

313.47 

1.23 

225.42 

17.17 

125 

0.51 

0.46 

0.15 

0.18 

0.14 

(1,2,1) 

250 

1.82 

0.70 

0.45 

1.18 

0.57 

Matrix 

500 

8.21 

7.13 

2.69 

8.47 

2.33 

1000 

40.16 

29.30 

18.15 

64.31 

7.89 

2000 

857.84 

194.14 

129.05 

491.40 

32.87 

Biphenyl 

966 

107.72 

53.26 

19.58 

44.25 

10.26 

SiOSi6 

966 

360.21 

172.86 

97.49 

248.07 

41.61 

Zeolite  ZSM-5 

2053 

281.02 

161.28 

145.65 

376.65 

106.25 

Table  6.2:  Timing  Results 
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Maximum  Residual  Norm 

Matrix 

Matrix 

LAPACK 

EISPACK 

LAPACK 

LAPACK 

Algorithm 

Type 

Size 

INVIT 

INVIT 

D&C 

QR 

Y 

125 

0.005 

0.377 

0.004 

Uniform 

250 

0.002 

0.601 

0.002 

Distribution 

500 

0.0001 

0.043 

0.0003 

0.0001 

(e  apart) 

1000 

0.0005 

0.728 

0.0005 

0.0005 

2000 

0.00002 

0.153 

0.00003 

0.002 

125 

1.83 

0.224 

0.130 

Uniform 

250 

msm 

2.60 

0.200 

0.197 

Distribution 

500 

0.030 

7.62 

0.358 

0.015 

(y/e  apart) 

1000 

0.012 

5.94 

0.181 

0.146 

2000 

0.001 

11.15 

0.197 

0.089 

125 

0.056 

3.31 

0.365 

Uniform 

250 

0.039 

6.19 

0.375 

Distribution 

500 

0.023 

5.33 

0.610 

0.025 

(e  to  1) 

1000 

0.023 

10.92 

0.410 

0.012 

2000 

0.017 

7.19 

0.334 

0.008 

Uniform 

125 

0.029 

1.96 

0.624 

Distribution 

250 

0.034 

4.77 

0.586 

(e  to  1  with 

500 

0.023 

5.33 

0.610 

0.025 

random  signs) 

1000 

0.019 

9.71 

0.589 

0.018 

2000 

0.012 

11.74 

0.569 

0.012 

125 

0.483 

0.048 

Geometric 

250 

■o 

0.882 

0.034 

Distribution 

500 

HU 

1.11 

0.038 

0.004 

(e  to  1) 

1000 

HU 

1.55 

0.040 

0.003 

2000 

HU 

1.90 

0.057 

0.040 

0.002 

Geometric 

125 

0.391 

0.081 

0.041 

Distribution 

250 

0.420 

0.065 

0.045 

(e  to  1  with 

500 

0.003 

0.902 

0.051 

0.048 

random  signs) 

1000 

0.003 

1.91 

0.061 

2000 

0.002 

2.07 

0.055 

125 

0.016 

0.993 

0.083 

0.373 

250 

0.011 

1.75 

0.087 

0.256 

Random 

500 

0.007 

1.79 

0.079 

0.354 

1000 

0.011 

1.75 

0.087 

0.256 

2000 

0.005 

4.15 

0.092 

0.383 

Table  6.3:  Maximum  Residual  Norms  =  max;  || Tii  —  A8Tj'||/rae||T|| 
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Matrix 

Type 

Matrix 

Size 

Maximum  Residual  Norm 

LAPACK 

INVIT 

EISPACK 

INVIT 

LAPACK 

D&C 

LAPACK 

QR 

Algorithm 

Y 

125 

0.004 

0.429 

0.011 

0.004 

Clustered 

250 

0.0005 

0.041 

0.002 

0.004 

3jt  £ 

500 

0.0002 

0.115 

0.001 

1000 

0.000007 

0.125 

0.0005 

0.0006 

2000 

0.0003 

0.400 

0.00004 

0.0003 

125 

0.004 

0.739 

0.014 

0.007 

Clustered 

250 

0.004 

0.854 

0.007 

0.002 

at  ±e 

500 

0.00003 

0.079 

0.004 

0.0003 

1000 

0.0005 

0.395 

0.002 

0.0008 

2000 

0.000002 

0.109 

0.001 

^■sn 

0.0008 

125 

1.31 

1.23 

0.156 

0.245 

0.251 

Clustered 

250 

1.45 

1.14 

0.190 

0.269 

0.168 

at  1 

500 

1.46 

1.37 

0.096 

0.308 

0.146 

1000 

1.52 

1.35 

0.067 

0.346 

0.107 

2000 

1.62 

1.45 

0.036 

0.340 

0.091 

125 

0.704 

9.53 

0.224 

0.399 

Clustered 

250 

0.712 

4.55 

0.138 

0.611 

msm 

at  ±1 

500 

0.664 

17.78 

0.074 

0.565 

■HI 

1000 

0.604 

22.10 

0.048 

0.563 

0.083 

2000 

0.625 

33.95 

0.034 

0.549 

0.076 

125 

0.047 

4.54 

0.164 

0.431 

0.100 

(1,2,1) 

250 

0.034 

8.31 

0.130 

0.411 

0.314 

Matrix 

500 

0.031 

21.32 

0.109 

0.400 

0.459 

1000 

0.035 

57.72 

0.105 

0.484 

0.116 

2000 

0.029 

181.37 

0.102 

0.618 

0.227 

Biphenyl 

966 

0.708 

MKjng 

0.004 

0.0009 

SiOSi6 

966 

179.87 

mam 

0.054 

0.020 

Zeolite  ZSM-5 

2053 

16.98 

0.097 

0.150 

Table  6.4:  Maximum  Residual  Norms  =  max;  || Tii  —  A84)j'||/rae||T|| 
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Maximum  Dot  Product 

Matrix 

Matrix 

LAPACK 

EISPACK 

LAPACK 

LAPACK 

Algorithm 

Type 

Size 

INVIT 

INVIT 

D&C 

QR 

Y 

125 

0.056 

0.272 

0.204 

0.067 

Uniform 

250 

0.046 

0.220 

0.108 

0.120 

Distribution 

500 

0.024 

0.216 

0.018 

0.068 

0.085 

(e  apart) 

1000 

0.023 

0.257 

0.017 

0.045 

0.084 

2000 

0.017 

0.260 

0.010 

0.032 

0.073 

125 

0.056 

0.200 

0.064 

0.136 

0.091 

Uniform 

250 

0.044 

0.196 

0.060 

0.108 

0.108 

Distribution 

500 

0.030 

0.234 

0.036 

0.074 

0.084 

{y/e  apart) 

1000 

0.022 

0.217 

0.047 

0.059 

0.159 

2000 

0.014 

0.230 

0.028 

0.091 

125 

0.048 

6.20 

0.080 

0.184 

Uniform 

250 

0.049 

25.48 

0.056 

0.106 

Distribution 

500 

0.046 

12.60 

0.058 

0.081 

(e  to  1) 

1000 

0.020 

61.44 

0.048 

0.062 

0.118 

2000 

0.015 

0.242 

0.046 

0.047 

0.170 

Uniform 

125 

0.119 

3.93 

0.096 

0.104 

0.070 

Distribution 

250 

0.106 

8.23 

0.052 

0.076 

0.114 

(e  to  1  with 

500 

0.132 

12.67 

0.105 

0.132 

random  signs) 

1000 

0.028 

17.67 

0.060 

0.140 

2000 

0.028 

17.23 

^B^l 

0.025 

0.228 

125 

0.052 

0.432 

0.096 

0.164 

Geometric 

250 

0.066 

0.308 

0.054 

0.104 

Distribution 

500 

0.036 

0.528 

0.036 

0.102 

(e  to  1) 

1000 

0.024 

0.360 

0.018 

0.058 

2000 

0.017 

0.330 

0.014 

0.052 

0.020 

Geometric 

125 

0.056 

0.192 

0.088 

0.192 

1.77 

Distribution 

250 

0.036 

0.248 

0.054 

0.144 

0.731 

(e  to  1  with 

500 

0.035 

0.246 

0.042 

0.140 

0.337 

random  signs) 

1000 

0.022 

0.244 

0.022 

0.066 

0.330 

2000 

0.016 

0.575 

0.020 

0.053 

127499.0 

125 

6.03 

0.096 

0.184 

0.839 

250 

13.63 

0.050 

0.112 

0.337 

Random 

500 

10.36 

0.050 

0.062 

0.254 

1000 

26.16 

0.031 

0.112 

0.415 

2000 

7.30 

0.042 

0.039 

0.610 

Table  6.5:  Maximum  Dot  Products  =  max;^3  \vfvj\/n£ 
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Matrix 

Type 

Matrix 

Size 

Maximum  Dot  Product 

LAPACK 

INVIT 

EISPACK 

INVIT 

LAPACK 

D&C 

LAPACK 

QR 

Algorithm 

Y 

125 

0.052 

0.829 

0.160 

0.008 

Clustered 

250 

0.044 

0.605 

HQ 

0.144 

0.004 

3jt  £ 

500 

0.034 

0.226 

0.020 

0.088 

0.002 

1000 

0.022 

0.626 

0.008 

0.067 

0.0007 

2000 

0.021 

2.78 

0.005 

0.048 

0.0005 

125 

0.705 

0.152 

0.014 

Clustered 

250 

0.455 

■mg 

0.120 

0.008 

at  ±e 

500 

0.029 

0.194 

0.022 

0.092 

0.003 

1000 

0.032 

14.94 

0.024 

0.082 

0.168 

2000 

0.015 

0.157 

0.004 

0.043 

0.0005 

125 

0.048 

1.26 

0.044 

0.064 

0.008 

Clustered 

250 

0.036 

2.96 

0.020 

0.062 

0.0008 

at  1 

500 

0.023 

0.738 

0.012 

0.052 

0.003 

1000 

0.019 

3.80 

0.032 

0.001 

2000 

0.015 

17720798.0 

■mg 

0.034 

0.0002 

125 

3.44 

0.048 

0.116 

Clustered 

250 

14.94 

0.024 

0.082 

at  ±1 

500 

0.022 

625.94 

0.016 

0.068 

0.027 

1000 

0.015 

296.76 

0.007 

0.045 

0.018 

2000 

0.016 

63357.9 

0.004 

0.048 

125 

63.24 

0.084 

0.623 

(1,2,1) 

250 

20.03 

0.028 

0.284 

Matrix 

500 

15.52 

0.030 

■■ 

0.308 

1000 

5.02 

0.032 

0.759 

2000 

■B 

1.13 

0.027 

■mg 

0.662 

Biphenyl 

966 

0.018 

0.162 

0.030 

0.073 

0.859 

SiOSi6 

966 

0.012 

70.86 

0.033 

0.070 

0.698 

Zeolite  ZSM-5 

2053 

0.010 

1.41 

0.045 

0.049 

0.382 

Table  6.6:  Maximum  Dot  Products  =  max;^3  \vfvj\/ns 
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invoking  the  ideas  of  Section  6.3,  submatrices  of  very  small  size  are  needed  by  our  new 
algorithm  to  compute  orthogonal  eigenvectors.  As  a  result,  Algorithm  Y  takes  as  little 
as  0.25  seconds  to  find  all  the  eigenpairs  of  such  a  matrix  of  size  2000,  while  traditional 
LAPACK  inverse  iteration  takes  840  seconds.  All  our  timing  results  confirm  Algorithm  Y  to 
be  an  0(n2)  method.  The  new  technique  of  finding  the  eigenvector  of  an  isolated  eigenvalue 
by  using  twisted  factorizations  (see  Chapter  3  for  details)  implies  that  our  new  algorithm  is 
faster  even  in  cases  that  are  favorable  for  EISPACK  and  LAPACK’s  inverse  iteration,  i.e. , 
matrices  of  type  4  and  type  7. 

We  mentioned  the  uncertainty  in  implementing  certain  steps  of  Algorithm  Y  at 
the  beginning  of  this  chapter.  The  reason  is  the  lack  of  a  complete  theory  behind  multiple 
representations.  This  uncertainty  has  led  to  some  redundancy  and  less  than  satisfactory  fixes 
(such  as  a  tolerance  on  the  acceptable  element  growth)  in  our  preliminary  code  which  will 
hopefully  be  eliminated  in  the  near  future.  In  addition,  whenever  we  form  a  relatively  robust 
representation  we  need  to  recompute  the  locally  small  eigenvalues  to  high  relative  accuracy. 
Thus,  even  though  it  is  still  an  0{n2)  process,  our  code  slows  down  somewhat  when  it  needs 
to  form  many  representations.  Due  to  these  reasons,  the  matrix  with  eigenvalues  arranged 
in  geometric  order  from  e  to  1  (with  random  signs)  is  the  hardest  one  for  Algorithm  Y, 
as  evinced  by  the  time  needed  and  an  occasionally  larger  than  acceptable  deviation  from 
orthogonality.  In  contrast,  Algorithm  Y  performs  very  well  on  a  similar  type  of  matrix 
where  the  eigenvalues  are  geometrically  distributed  from  e  to  1  (without  any  random  signs 
attached  to  them).  The  upcoming  Section  6.5  lists  some  possible  algorithmic  improvements 
that  should  remove  this  discrepancy  in  performance  on  these  similar  types  of  matrices. 
Tables  6. 3-6. 6  show  that  the  residual  norms  and  orthogonality  of  the  eigenpairs  computed 
by  our  current  implementation  of  Algorithm  Y  are  generally,  very  good. 

The  divide  and  conquer  method  is  the  fastest  among  all  the  earlier  algorithms  and 
on  examples  where  eigenvalues  are  close,  it  outperforms  our  preliminary  implementation  of 
Algorithm  Y.  This  success  is  due  to  the  deflation  process,  where  an  eigenpair  of  a  submatrix 
is  found  to  be  an  acceptable  eigenpair  of  the  full  matrix.  However,  in  cases  where  the 
eigenvalues  are  well-separated,  the  divide  and  conquer  algorithm  takes  0(n3)  time  and  is 
slower  than  Algorithm  Y.  One  major  drawback  of  LAPACK  D&C  is  its  extra  workspace 
requirement  of  more  than  2 n2  double  precision  words,  which  can  be  prohibitively  excessive. 
In  fact,  in  the  recent  past,  some  people  have  turned  to  LAPACK  INVIT  instead  of  divide 
and  conquer  to  solve  their  large  problems  solely  for  this  reason  [6,  53,  52].  Of  course, 
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workspace  of  only  about  5-10  n  double  precision  words  is  required  by  Algorithm  Y,  and 
EISPACK  and  LAPACK  INVIT. 

The  QR  method  is  seen  to  uniformly  take  0(n3)  time  and  is  less  sensitive  to 
the  eigenvalue  distribution.  It  is  quite  competitive  with  the  other  methods  on  matrices  of 
small  size.  Both  the  QR  method  and  the  divide  and  conquer  algorithm  produce  accurate 
eigenpairs,  as  seen  from  Tables  6.3,  6.4,  6.5  and  6.6. 

We  now  discuss  performance  of  the  various  algorithms  on  matrices  that  arise  from 
quantum  chemistry  applications.  Table  6.2  shows  that  our  Algorithm  Y  is  the  fastest 
method  for  these  matrices.  In  the  results  discussed  above,  the  matrices  were  characterized 
by  their  eigenvalue  distributions.  In  Figures  6.1,  6.2  and  6.3,  we  did  give  the  eigenvalues  of 
the  three  matrices  under  consideration.  However,  the  quantities  of  interest  are  the  absolute 
and  relative  gaps  between  the  eigenvalues.  The  left  halves  of  Figures  6.4,  6.5  and  6.6  plot 
the  logarithms  of  absolute  gaps,  i.e. , 


absgap(i) 

\\n 


min(Aj_|_i 


—  l) 

"PH 


while  the  right  halves  plot  the  relative  gaps  on  a  logarithmic  scale,  i.e., 
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Recall  that  in  (6.4.5)  and  (6.4.6),  the  eigenvalues  are  arranged  in  ascending  order. 

From  Figure  6.4,  we  see  that  almost  all  the  absolute  gaps  for  the  biphenyl  matrix 
are  less  than  10-3||T||  and  consequently,  both  LAPACK  and  EISPACK  INVIT  spend  0(n3) 
time  on  this  matrix  of  order  966.  Most  of  the  eigenvalues,  however,  agree  in  less  than  3 
leading  digits  and  as  a  result,  Algorithm  Y  is  about  10  times  faster  than  LAPACK  INVIT. 
This  translates  to  a  3-fold  increase  in  speed  of  the  total  dense  symmetric  eigenproblem. 
Similar  behavior  is  observed  in  the  SiOSie  matrix.  We  see  a  different  distribution  in  the 
Zeolite  example  where  the  relative  gaps  are  similar  to  the  absolute  gaps.  Despite  the  need 
to  form  many  representations,  we  observe  that  Algorithm  Y  is  still  faster  than  the  earlier 
methods. 


We  have  also  implemented  a  parallel  variant  of  Algorithm  Y  in  collaboration  with 
computational  chemists  at  Pacific  Northwest  National  Laboratories  (PNNL).  This  will  re¬ 
place  the  earlier  tridiagonal  eigensolver  based  on  inverse  iteration  that  was  part  of  the 
PeIGS  version  2.0  library  [52].  Our  new  method  is  more  easily  parallelized,  and  coupled 
with  its  lower  operation  count,  offers  an  even  bigger  computational  advantage  on  a  parallel 
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Figure  6.4:  Absolute  and  Relative  Eigenvalue  Gaps  for  Biphenyl 


Figure  6.5:  Absolute  and  Relative  Eigenvalue  Gaps  for  SiOSie 

computer.  For  example,  on  the  biplienly  matrix  our  new  parallel  implementation  is  100 
times  faster  on  a  128  processor  IBM  SP2  machine.  More  performance  results  are  given 
in  [39]. 

6.5  Future  Enhancements  to  Algorithm  Y 

We  now  list  various  ways  in  which  our  current  implementation  of  Algorithm  Y 
may  be  improved. 

Multiple  Representations.  We  would  like  to  complete:  the  relative  perturbation  theory 
for  factorizations  of  indefinite  matrices.  Some  preliminary  results  and  conjectures 
were  presented  in  Chapter  5.  We  plan  to  investigate  using  twisted  factorizations  as 
relatively  robust  representations.  We  believe  that  completing  this  theory  will  lead  to 
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Figure  6.6:  Absolute  and  Relative  Eigenvalue  Gaps  for  Zeolite  ZSM-5 
a  more  elegant  and  efficient  implementation. 

Choice  of  Base  Representation.  In  both  Algorithms  X  and  Y,  the  first  step  is  to  form  a 
base  representation  by  translating  the  matrix  and  making  it  positive  definite.  We  do  so 
because  of  the  well  known  relative  perturbation  properties  of  the  resulting  bidiagonal 
C'liolesky  factor.  However,  we  could  instead  form  a  relatively  robust  factorization  of 
an  indefinite  matrix.  As  seen  from  the  R.R.R.  trees  in  Figures  5.1  and  5.2,  a  judicious 
choice  of  the  base  representation  may  save  us  from  forming  multiple  representations 
and  lienee,  result  in  greater  speed.  For  example,  by  forming  a  base  representation  near 
zero,  we  may  be  able  to  substantially  reduce  the  amount  of  time  spent  by  Algorithm  Y 
on  matrices  with  eigenvalues  that  are  geometrically  distributed  from  f  to  1  (with 
random  signs).  In  the  near  future,  we  hope  to  incorporate  a  measure  of  “goodness” 
with  each  shift  at  which  we  can  form  a  relatively  robust  representation,  and  start  with 
the  “best”  possible  base  representation. 

Exploiting  sparse  eigenvectors.  As  seen  in  Figure  4.6,  a  majority  of  the  entries  in  an 
eigenvector  may  be  negligible  (this  figure  actually  plots  an  eigenvector  of  the  biphenyl 
matrix).  We  can  make  our  implementation  more  efficient  by  setting  these  negligible 
entries  to  zero,  in  an  a  priori  manner.  In  fact,  the  submatrix  ideas  of  Section  6.3  do 
produce  sparse  eigenvectors  for  clusters,  and  can  greatly  reduce  the  amount  of  work 
needed  as  seen  in  the  previous  section.  Even  isolated  eigenvalues  can  have  sparse 
eigenvectors,  and  we  intend  to  exploit  this  sparsity  and  also  push  the  submatrix  ideas 
further  to  get  a  more  efficient  code.  We  believe  that  by  doing  so,  we  can  get  an 
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algorithm  that  is  uniformly  faster  than  the  divide  &  conquer  method. 

Exploiting  Level-2  BLAS-like  operations.  Different  eigenvectors  are  computed  inde¬ 
pendently  of  each  other  by  Algorithm  Y.  By  computing  multiple  eigenvectors  at  the 
same  time,  we  would  be  able  to  vectorize  the  arithmetic  operations,  and  exploit  mul¬ 
tiple  functional  units  in  modern  computer  architectures  such  as  the  IBM  RS/6000 
processor.  Such  a  strategy  has  been  adopted  to  speed  up  the  LAPACK  implementa¬ 
tion  of  bisection  on  some  computer  architectures  [107,  1].  We  say  that  such  a  strategy 
leads  to  a  level-2  BLAS-like  operation  since  it  involves  a  quadratic  amount  of  data 
and  a  quadratic  amount  of  work  [67]. 

We  hope  to  resolve  the  above  mentioned  algorithmic  enhancements  in  the  near  fu¬ 
ture.  These  should  result  in  Algorithm  Z,  and  be  included  in  future  releases  of  LAPACK  [1] 
and  and  ScaLAPACK  [22]. 

Finally,  we  remind  the  reader  that  the  timing  and  accuracy  results  presented  in 
this  chapter  must  be  considered  to  be  transitory.  Future  improvements  could  lead  to  further 
increase  in  speed  of  our  new  0(n2)  algorithm.  AU  the  earlier  algorithms  to  which  we  compare 
our  new  methods  have  been  researched  for  more  than  15  years,  and  we  ask  the  kind  reader 
for  some  patience  as  we  try  to  produce  an  algorithm  that  is  provably  correct  in  floating 
point  arithmetic  while  simultaneously  being  (i)  0(n2)  and  (ii)  embarrassingly  parallel. 
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Chapter  7 

Conclusions 


In  this  thesis,  we  have  shown  how  to  compute  numerically  orthogonal  approxi¬ 
mations  to  eigenvectors  without  resorting  to  any  reorthogonalization  technique  such  as  the 
Gram-Schmidt  process.  Consequently,  our  new  algorithm  can  compute  orthogonal  eigen¬ 
vectors  in  0{n2)  time.  The  individual  eigenvectors  are  computed  independently  of  each 
other  which  is  well-suited  for  parallel  computation.  Moreover,  our  algorithm  can  deliver 
any  subset  of  the  eigenvalues  and  eigenvectors  at  a  cost  of  O(n)  operations  per  eigenpair. 

We  now  discuss  some  of  the  guiding  principles  behind  the  advances  mentioned 
above.  We  feel  that  these  principles  are  general  in  the  sense  that  they  could  be  applied  to 
other  numerical  procedures.  In  the  following,  we  list  these  principles  and  show  how  we  have 
applied  them  to  get  a  faster  0{n2)  solution  to  the  tridiagonal  eigenproblem. 

Do  not  try  to  compute  a  quantity  that  is  not  well-determined.  When  eigenvalues 
are  close,  individual  eigenvectors  are  extremely  sensitive  to  small  changes  in  the  ma¬ 
trix  entries.  On  the  other  hand,  the  invariant  subspace  corresponding  to  these  close 
eigenvalues  is  well-determined  and  any  orthogonal  basis  of  this  subspace  will  suffice 
as  approximate  eigenvectors.  Earlier  EISPACK  and  LAPACK  software  attempt  to 
compute  such  a  (non-unique)  basis  by  explicit  orthogonalization  and  hence,  can  take 
0(n3)  time.  However,  as  explained  in  Chapter  5,  we  have  been  able  to  find  alter¬ 
nate  representations  of  the  initial  matrix  so  that  the  refined  eigenvalues  are  no  longer 
“close”  and  the  corresponding  eigenvectors  are  now  well-determined.  These  new  rep¬ 
resentations  allow  us  to  identify  a  robust  orthogonal  basis  of  the  desired  subspace  and 
hence,  compute  it  cheaply.  Another  major  advance  in  our  methods  comes  by  recog¬ 
nizing  that  the  bidiagonal  factors  of  a  tridiagonal  determine  the  desired  quantities 
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much  “better”  than  its  diagonal  and  off-diagonal  entries. 

Exploiting  transformations  for  which  the  solution  is  invariant.  Eigenvalues  of  a  ma¬ 
trix  A  are  invariant  under  the  similarity  transformations  S~1AS,  and  change  in  a  sim¬ 
ple  way  under  affine  transformations  of  the  form  a  A  +  /3,  a  /  0.  Several  methods  such 
as  the  QR,  LR  and  qd  algorithms  take  advantage  of  such  transformations.  Eigenvec¬ 
tors  of  A  are  easily  seen  to  be  invariant  under  any  affine  transformation.  We  exploit 
this  property  to  great  advantage  by  obtaining  various  representations  of  the  given 
problem,  each  of  which  is  “better”  suited  for  computing  a  subset  of  the  spectrum. 

Finding  “good”  representations.  We  need  to  make  sure  that  the  above  transformations 
do  not  affect  the  accuracy  of  the  desired  quantities.  Thus  to  produce  eigenpairs  that 
have  small  residual  norms,  as  specified  in  (1.1.1),  we  need  to  avoid  large  element 
growths  in  forming  the  intermediate  factorizations 

T  - /il  =  LDLt.  (7.0.1) 


See  Chapter  5  for  details. 

High  Accuracy  in  key  computations  can  lead  to  speed.  The  main  reason  for  form¬ 
ing  multiple  representations,  as  in  (7.0.1),  is  to  avoid  Gram-Schmidt  like  methods  in 
computing  orthogonal  approximations  to  the  eigenvectors.  Our  faster  0(n2)  scheme 
becomes  possible  only  because  the  intermediate  representations  determine  their  small 
eigenvalues  to  high  relative  accuracy.  We  can  then  compute  these  small  eigenvalues 
to  full  relative  accuracy  and  use  these  highly  accurate  eigenvalues  to  compute  eigen¬ 
vectors.  The  resulting  eigenvectors  we  compute  are  “faithful”  and  orthogonal  as  a 
consequence.  In  numerical  methods,  there  is  generally  believed  to  be  a  trade-off  be¬ 
tween  speed  and  accuracy,  i.e. ,  higher  accuracy  can  be  achieved  only  at  the  expense 
of  computing  time.  However,  we  have  demonstrated  that  high  accuracy  in  the  right 
places  can  actually  speed  up  the  computation. 

7.1  Future  Work 

In  Section  6.5,  we  mentioned  further  enhancements  to  Algorithm  Y  which  we 
intend  to  incorporate  in  the  near  future.  Here  we  see  how  some  of  our  ideas  may  be  applied 
to  other  problems  in  numerical  linear  algebra. 
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Perfect  Shifts.  In  Section  3.5.1  we  showed  how  to  effectively  use  the  perfect  shift  strategy 
in  a  QR-like  algorithm.  This  new  scheme  enables  us  to  use  “ultimate”  or  “perfect” 
shifts  as  envisaged  by  Parlett  in  [110].  It  also  solves  the  problem  of  immediately 
deflating  a  known  eigenpair  from  a  symmetric  matrix  by  deleting  one  of  its  rows  and 
columns.  We  intend  to  substantiate  these  claims  with  numerical  experiments  in  the 
near  future  [41]. 

Non-symmetric  Eigenproblem.  Since  Algorithm  Y  does  not  invoke  any  Gram-Schmidt 
like  process,  it  does  not  explicitly  try  to  compute  eigenvectors  that  are  orthogonal. 
Orthogonality  is  a  result  of  the  matrix  being  symmetric.  We  believe  that  many  of 
our  ideas  can  be  applied  to  the  non-symmetric  eigenproblem  in  order  to  obtain  the 
“best-conditioned”  eigenvectors. 

Lanczos  Algorithm.  The  method  proposed  by  Lanczos  in  1952  [97],  after  many  modi¬ 
fications  to  account  for  roundoff  [125,  110,  126],  has  become  the  champion  among 
algorithms  to  find  some  of  the  extreme  eigenvalues  of  a  sparse  symmetric  matrix.  It 
proceeds  by  incrementally  forming,  one  row  and  column  at  a  time,  a  tridiagonal  ma¬ 
trix  that  is  similar  to  the  sparse  matrix.  We  would  like  to  investigate  if  there  are  any 
benefits  in  using  the  LDL T  decomposition  of  this  tridiagonal  or  its  translates.  The 
sparsity  of  the  eigenvectors  of  the  tridiagonal,  see  Figure  4.6  for  an  example,  may  also 
be  exploited  in  reducing  the  amount  of  work  in  the  selective  orthogonalization  phase. 
See  [125]  for  details  on  the  latter  phase. 

Rank- Revealing  Factorizations.  The  twisted  factorizations  introduced  in  Section  3.1 
enable  us  to  accurately  compute  a  null  vector  of  a  nearly  singular  tridiagonal  matrix. 
The  particular  twisted  factorization  used  is  successful  because  it  transparently  reveals 
the  near  singularity  of  the  tridiagonal.  As  discussed  in  Section  3.6,  twisted  factoriza¬ 
tions  can  also  be  formed  for  denser  matrices.  A  nice  property  is  that  they  tend  to 
preserve  the  sparsity  pattern  of  the  matrix.  Coupled  with  their  rank-revealing  prop¬ 
erties,  twisted  factorizations  may  become  an  invaluable  computational  tool  in  banded 
(sparse)  matrix  computations. 
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Case  Studies 


The  following  case  studies  present  examples  that  ihustrate  several  important  as¬ 
pects  of  finding  eigenvectors  of  a  symmetric  tridiagonal  matrix.  Much  of  the  upcoming 
material  appears  in  the  main  text  of  this  thesis.  However,  we  feel  that  these  examples 
offer  great  insight  to  the  problem  under  scrutiny,  and  so  we  have  chosen  to  present  them 
in  greater  detail  now.  The  raw  numbers  given  in  the  examples  may  also  reveal  more  than 
what  we  have  been  able  to  extract!  We  have  tried  to  make  these  case  studies  independent 
of  the  main  text  so  the  reader  should  be  able  to  follow  them  without  having  to  read  the 
whole  thesis.  Of  course,  for  more  theoretical  details  the  reader  should  read  the  main  text. 

All  the  analytical  expressions  in  the  examples  to  follow  were  obtained  by  using  the 
symbol  manipulators,  Maple  [21]  and  Mathematica  [137]. 
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Case  Study  A 

The  need  for  accurate  eigenvalues 

This  example  demonstrates  the  need  for  high  accuracy  in  the  computed  eigenval¬ 
ues  in  order  for  inverse  iteration  to  succeed.  We  will  show  how  LAPACK  and  EISPACK 
implementations  fail  in  the  absence  of  such  accuracy.  Let 

1  Ve  0 

T  =  ^fe  7e/4  e/4  (A. 0.1) 

0  e/4  3e/4 

where  e  <  1  and  is  of  the  order  of  the  machine  precision.  The  eigenvalues  of  T  are  : 

Ai  =  e/2  +  0(e2), 

A2  =  e  +  0(e2), 

A3  =  1  +  e  +  0(e2). 

Let  denote  the  corresponding  eigenvectors  with  ||  •  H2  =  1. 

-\Zej2  +  0(e3/2)  -t/F/2  +  0(e3/2)  1  -  e/2  +  0(e3) 

”1  =  ^(l  +  |)  +  0(e2)  W2=  ^(l-f)  +  0(e2)  ,n3=  ^  +  0(s3/2)  . 

_  ^(-1  +  f )  +  °(£2)  .  .  7i(1  +  f)  +  °(£2)  .  _  e3/2/4  + 0(e5/2)  _ 

Typically  eigenvalues  of  a  symmetric  tridiagonal  can  be  computed  only  to  a  guar¬ 
anteed  accuracy  of  0{e\ \T\ |).  The  eigenvalues  ofT  as  computed  by  MATLAB’s  eig  function, 
that  uses  the  QR  algorithm,  are 

Ai  =  e, 

A2  = 

A3  =  1  +  e. 
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The  first  eigenvector  may  be  computed  as 

yi  =  (T-  A  1l)~1b1 

where  bx  =  £i^i  +  6^2  +  £3^3,  £?  +  £f  +  £l  =  1,  and  £2  ^  °- 
In  exact  arithmetic, 

£1  ,  £2  ,  £3 

*  =  TAT1  +  _ 

-  [°‘ +  ayry*'1  + 

=  ol~r)(<’2  +  0(£)Vi  +  0(£2)v3 ) 

provided  (£i/£2)  and  (£s/£2)  are  0(1).  Due  to  the  inevitable  roundoff  errors  in  finite  preci¬ 
sion,  the  best  we  can  hope  to  compute  is 

Vi  =  q ^2j  ( v 2  +  °(eM  +  0(e)v3 )  • 

This  vector  is  normalized  to  remove  the  l/0(e2)  factor  and  give  hi.  The  second  eigenvector 
may  be  computed  as 

y2  =  ( T-\2I)~1b2 . 

Since  X2  =  Ai,  the  computed  value  of  y2  is  almost  parallel  to  hi  (assuming  that  b2  has  a 
non-negligible  component  in  the  direction  of  v2),  i.e. , 

h  =  0{e 2)  +  °^Vl  +  °(£)W3)  • 

Since  Ai  and  A  2  are  coincident,  typical  implementations  of  inverse  iteration  orthogonalize 
the  vectors  hi  and  y2  by  the  modified  Gram-Schmidt(MGS)  method  in  an  attempt  to  reveal 
the  second  eigenvector  : 


^  =  y2-(ylh)vi 

=  \  (O(e)-Ci  +  0{e)v2  +  0(e)v3) 

h  =  z/\\z\\- 

As  we  see  above,  the  resulting  vector  v2  is  far  from  being  orthogonal  to  hi.  A  second  step  of 
MGS  as  recommended  in  [110,  pgs. 105-109]  does  not  remedy  the  situation.  Although  the 
second  step  may  remove  the  unwanted  component  of  hi  from  v2,  the  undesirable  component 
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of  t>3  will  remain  (remember  that  A2  and  A3  are  well  separated  and  most  implementations 
will  never  explicitly  orthogonalize  V2  and  #3).  Some  implementations  may  repeat  the  above 
process  of  inverse  iteration  followed  by  one  or  two  steps  of  MGS  in  order  to  compute  the 
second  eigenvector.  However  the  same  sort  of  computations  as  described  above  are  repeated 
and  iterating  does  not  satisfactorily  solve  the  problem. 

We  now  exhibit  the  above  behavior  by  giving  a  numerical  example.  Set  e  to 
the  machine  precision  (~  2.2  X  10-16  in  IEEE  double  precision  arithmetic)  in  the  matrix 
of  (A. 0.1). 

The  EISPACK  implementation  Tinvit  does  a  step  of  inverse  iteration  followed  by 
MGS  and  repeats  this  until  convergence.  See  figure  2.3  and  [118]  for  more  details.  Given 
the  computed  eigenvalues  as  Ai  =  e,  A2  =  e(l  +  e)  and  A3  =  1  +  e,  Tinvit  computes  the 
first  eigenvector  as  : 

[  -1.05  •  10-08  1 


Vl 


0.707 

0.707 


Note  that  here  we  chose  A2  to  be  a  floating  point  number  just  bigger  than  Ai  to  prevent  Tin¬ 
vit  from  perturbing  A2  by  e||T||.  The  computation  of  the  second  eigenvector  is  summarized 
in  the  following  table  : 


Step 

1 

Before  MGS 

After  MGS 

Iterate  (y) 

-1.05  •  10“8 
0.707 

0.707 

8.34-  10“9 
-0.738 

0.674 

yTv  1 

1.000 

0.0452 

CO 

5.8  •  10“25 

2.7-  10“9 

Tinvit  accepts  the  above  iterate  as  having  converged  and  hence  the  eigenvectors 
output  are  such  that  vjvi  =  0.0452  and  vjv^  =  2.7  •  10-9!  Due  to  the  unwanted  component 
of  f>3,  the  residual  norm  ||(T  —  A2-?") ^2 1 1  =  2.7  •  10-9.  All  the  other  residual  norms  are 
0(e\\T\\). 

The  LAPACK  implementation  Dstein  performs  two  additional  iterations  after 
detecting  convergence(see  figure  2.4  and  [87]).  As  in  Tinvit,  the  first  eigenvector  is  com¬ 
puted  accurately  by  Dstein.  The  iterations  for  the  computation  of  the  second  eigenvector 


172 


are  summarized  in  the  following  table. 


Step 

1 

2 

3 

Before  MGS 

After  MGS 

Before  MGS 

After  MGS 

Before  MGS 

After  MGS 

Iterate 

(y) 

1.05  •  10-8 
-0.707 

-0.707 

1.04-  10“8 
-0.697 

0.716 

1.05  •  10-8 
-0.707 

-0.707 

1.05  •  10-8 
-0.7069 

0.7073 

1.05  •  10-8 
-0.707 

-0.707 

1.05  •  10-8 
-0.707108 

0.707105 

\yTh\ 

1.000 

0.0014 

1.000 

3.0  •  10“4 

1.000 

1.9  •  10-6 

\yTh\ 

3.9  •  10-24 

4.1  •  IQ-11 

5.8  •  10-25 

2.9  •  10-12 

2.2  •  10-24 

1.1  •  10-13 

Thus  the  eigenvectors  output  by  Dstein  are  such  that 

vjvi  =  1.9  •  10-6, 

V2V3  =  1.1  •  10-13  and  ||(T  —  A2/)h2||  =  1.1  •  10-13. 

Some  implementations  such  as  the  PeIGS  software  library  [52,  54]  perform  two 
steps  of  the  MGS  process  after  every  inverse  iteration  step.  To  exhibit  that  this  may  also 
not  be  satisfactory,  we  modified  Tinvit  to  perform  two  MGS  steps.  The  following  tabulates 
the  iteration  in  computing  the  second  eigenvector. 


Step 

1 

Before  MGS 

After  1st  MGS 

After  2nd  MGS 

Iterate  (y) 

1.05  •  10-8 
0.707 

0.707 

8.34-  10“9 
-0.738 

0.674 

-7.9  •  10“9 
-0.70710678118 

0.70710678118 

VTv  1 

1.000 

0.0452 

6.6  •  10-16 

CO 

5.8  •  10“25 

2.7-  10“9 

2.7  •  10“9 

Hence  even  though  the  vectors  hi  and  V2  are  numerically  orthogonal,  f>2  contains  an  un¬ 
wanted  component  of  V3.  As  a  result  ||(T  —  A2 )7^2 1 1  =  2.7  •  10-9. 

Note  that  the  above  problems  occur  because  the  eigenvalues  are  not  accurate 
enough.  Current  inverse  iteration  implementations  do  not  detect  such  inaccuracies. 

Eigenvalues  of  the  matrix  in  (A. 0.1)  may  be  computed  to  high  relative  accuracy 
by  the  bisection  algorithm.  If  such  eigenvalues  are  input  to  Tinvit  or  Dstein,  the  com¬ 
puted  eigenvectors  are  found  to  be  numerically  orthogonal.  In  fact,  even  if  no  explicit 
orthogonalization  is  done  inside  Tinvit  and  Dstein,  the  eigenvectors  output  turn  out  to 
be  numerically  orthogonal.  We  will  investigate  this  phenomenon  in  Case  Study  B. 
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Case  Study  B 

Bidiagonals  are  Better 


The  purpose  of  this  case  study  is  to  show  that  a  tridiagonal  matrix  is  “better” 
represented  by  a  product  of  bidiagonal  matrices  for  the  task  of  computing  eigenvalues  and 
eigenvectors. 

When  any  numerical  algorithm  is  implemented  on  a  computer,  roundoff  error  needs 
to  be  accounted  for,  and  it  is  good  practice  to  show  that  calculations  are  not  destroyed  by 
the  limited  precision  of  the  arithmetic  operations.  Wilkinson  made  popular  the  art  of 
backward  error  analysis ,  whereby  the  computed  solution  is  shown  (if  possible)  to  be  the 
exact  solution  corresponding  to  data  that  is  slightly  different  from  the  input  data  [135]. 
When  combined  with  knowledge  about  the  sensitivity  of  the  solution  to  small  changes  in 
the  data,  this  approach  enables  us  to  prove  the  accuracy  of  various  numerical  procedures. 

A  real,  symmetric  tridiagonal  matrix  T  is  traditionally  represented  by  its  n  di¬ 
agonal  and  n  —  1  off-diagonal  elements.  Some  highly  accurate  algorithms  for  finding  the 
eigenvalues  of  T,  such  as  bisection,  can  be  shown  to  find  an  exact  eigenpair  of  T  +  ST,  where 
the  perturbation  ST  is  a  small  relative  perturbation  in  only  the  off-diagonals  of  T.  In  this 
Case  Study,  by  examining  the  effect  of  such  perturbations  on  the  spectrum  of  T,  we  show 
that  for  our  purposes  it  is  better  to  represent  a  tridiagonal  matrix  by  its  bidiagonal  factors. 

First,  we  examine  the  matrix  we  encountered  in  Case  Study  A  earlier,  and  observe 
that  it  does  have  the  perturbation  properties  we  desire.  However  not  all  tridiagonals  share 
this  property  and  we  will  illustrate  this  with  another  matrix. 
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Consider  the  matrix  T\  and  a  small  perturbation  : 

1  Ve  0  1  T  1  Ve(!  +  e)  0 

T\  =  y/e  7e/4  e/4  ,  Ti  +  8T\  =  ^(1  +  e)  7e/4  e(l  +  e)/4  (B.0.1) 

0  e/4  3e/4  J  [  0  e(l  +  e)/4  3e/4 

where  e  is  of  the  order  of  the  machine  precision.  The  eigenvalues  of  T\  are 

Ai  =  e/2  —  e2/4  +  0(e3),  A2  =  e  —  e2/2  +  0(e3),  A3  =  1  +  e  +  0(e2), 
and  those  of  T\  +  6T\  are 

Ai  +  ^Ai  =  e/2  -  3e2/2  +  0(e3),  A2  +  SX2  =  e  -  5e2/4  +  0(e3),  A3  +  S\3  =  1  +  e  +  0(e2). 
The  eigenvector  matrix  of  T\  is 

—  \/e/2  +  0(e3/2)  -\/e/2  +  0(e3/2)  l-e/2  +  0(e2) 

v=  ^d  +  fi  +  oiei  j-(i-f)  +  o(e)  e^+oiv/2)  , 

.  7j(-l  +  f)  +  0(£2)  ^j(l  +  f)  +  0(£2)  E3/2/4  +  0(£5/2)  _ 

while  the  perturbed  matrix  Ti  +  <5Ti  has  the  eigenvectors 

—  \/e/2  +  0(e3/2)  -\/e/2  +  0(e3/2)  l-e/2  +  0(e2) 

T  +  AP  =  ^(!  +  ^)  +  0(e2)  7j(l-1r)  +  0(e2)  V^+0(£3/2)  • 

_  ^(-1  +  ^)  +  0(e2)  ^(1  +  +  0(e2)  e3/2/4  +  0(e5/2)  _ 

From  the  above  expressions,  it  is  easy  to  see  that  |£Aj|/|Aj|  =  0(e)  and  |^F/j|/|4/j|  =  0(e) 
for  i,j  =  1,  2,  3.  Hence,  the  small  relative  perturbations  in  the  off-diagonals  of  T\  cause  small 
relative  changes  in  the  eigenvalues  and  small  relative  changes  in  the  eigenvector  components. 
The  eigenvalues  of  such  a  matrix  may  be  computed  to  high  relative  accuracy  by  the  bisection 
algorithm.  It  turns  out  that  the  high  accuracy  of  these  eigenvalues  obviates  the  need  to 
orthogonalize  in  implementations  of  inverse  iteration.  Indeed,  we  modified  the  LAPACK 
and  EISPACK  implementations  of  inverse  iteration  (Dstein  and  Tinvit  respectively)  by 
turning  off  all  orthogonalization,  and  found  the  computed  eigenvectors  to  be  numerically 
orthogonal.  Note  that  orthogonality  is  achieved  despite  the  fact  that  Ai  and  A2  are  quite 
close  together! 

The  theory  developed  in  [37]  does  predict  that  the  eigenvalues  of  matrix  T\  given 
in  (B.0.1)  are  determined  to  high  relative  accuracy  with  respect  to  small  relative  perturba¬ 
tions  in  the  off-diagonal  elements.  However  there  are  tridiagonals  that  behave  differently. 


175 


To  demonstrate  this,  we  consider  the  tri diagonals  : 

1  -  ye  e^V1  -  7e/4  0 

T2  =  e1/4yi  -  7e/4  ye  +  7e/4  e/4  , 

0  e/4  3e/4 

1-ye  e^V1  -  7e/4(l  +  e)  0 

T2  +  6T2  =  e1/,4yi  -  7e/4(l  +  e)  ye  +  7e/4  e(l  +  e)/4 

0  e(l  +  e)/4  3e/4 

The  eigenvalues  of  T2  are 

Ar  =  e/2  +  e3/,2/8  +  0(e2),  \2  =  e  —  e3^2/ 8  +  0(e2),  A3  =  1  +  e  +  0(e2), 
while  those  of  the  perturbed  matrix  T2  +  6T2  are 

Ar  +  =  e/2  -  7e3/2/8  +  0(e2),  A2  +  S\2  =  e  -  9e3/2/8  +  0(e2),  A3  +  S\3  =  1  +  e  +  0(e2). 

The  eigenvectors  of  T2  are  given  by 

'  y§(l  +  f)  +  0(e5/4)  -^§(1  +  #)  +  0(e5/4)  1- ye/2  +  0(e) 

F=  -^(l-#)  +  0(e)  ^(l-f)  +  0(e)  y/4(i  +  ^)  +  o(e5/4)  , 

_  ^(1  -  f )  +  0(e3/2)  ^(1  +  f )  +  0(e3/2)  ^(1  +  ^ )  +  0(e9/4)  _ 

and  the  eigenvectors  of  T2  +  6T2  are 

"  ^(i  +  ^#)  +  0(e5/4)  —  ■N/f(l  -  ¥)  +  0(e5/4)  1- ye/2  +  0(e) 

'  ^  -^(l  +  ¥)  +  0(e)  ^(l-^#)  +  0(e)  e4/4(l  +  ^)  +  0(e5/4)  . 

■^(1-2 v^)  +  0(e)  ^75(1  +  2-v^)  +  0(e)  ^(1  +  ^)  +  0(e9/4)  _ 

Note  that  from  the  above  expressions,  we  see  that 
I^Aj/AjI  =  0(Vi),  and 

|<54y/Ty|  =  0(y/e)  for  7  =  1,2,3  and  j  =  1,2. 


Thus  T2  does  not  determine  its  eigenvalues  and  eigenvector  components  to  high 
relative  accuracy  and  consequently,  it  is  unlikely  that  we  would  be  able  to  compute  nu¬ 
merically  orthogonal  eigenvectors  by  inverse  iteration  without  taking  recourse  to  explicit 
orthogonalization.  To  confirm  this,  we  repeated  the  experiment  of  turning  off  all  orthogonal- 
ization  in  Dstein  and  Tinvit  but  found  the  computed  “eigenvectors”  to  have  dot  products 
as  large  as  0(y/e). 
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However,  there  is  a  way  out  of  this  impasse.  Since  T2  is  positive  definite,  we  can 
form  its  Cholesky  decomposition, 

t2  =  llt. 

Due  to  the  tri diagonal  form  of  T2,  the  Cholesky  factor  L  is  bidiagonal.  It  is  now  known 
that  small  relative  changes  to  the  entries  of  a  bidiagonal  matrix  cause  small  relative  changes 
to  its  singular  values,  and  small  changes  in  the  directions  of  the  singular  vectors  if  the 
relative  gaps  between  the  singular  values  are  large.  Note  that  the  eigenvalues  of  T2  are  the 
squares  of  the  singular  values  of  L  while  its  eigenvectors  are  the  left  singular  vectors  of  L. 
This  property  of  bidiagonal  matrices  gives  us  an  impetus  to  produce  algorithms  where  the 
backward  errors  may  be  posed  as  small  relative  changes  in  L  rather  than  in  T.  By  doing 
so  we  can  eliminate  the  need  for  orthogonalization  when  finding  eigenvectors  of  T2  since  its 
eigenvalues  are  relatively  far  apart.  We  have  accomplished  this  task  in  Chapter  4. 

The  observant  reader  will  notice  that  both  T\  and  T2  are  positive  definite  and  thus 
admit  Cholesky  decomposition.  What  can  we  do  if  the  tridiagonal  is  indefinite?  In  such 
a  case,  we  could  turn  to  the  LU  decomposition  of  the  tridiagonal.  However,  the  relative 
perturbation  theory  in  the  indefinite  case  has  not  been  extensively  studied  and  is  not  well 
understood.  We  attempt  to  answer  some  questions  in  Chapter  5. 
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Case  Study  C 

Multiple  representations  lead  to 
orthogonality 


In  Chapter  4,  we  showed  how  to  compute  orthogonal  approximations  to  eigenvec¬ 
tors  of  a  positive  definite  tridiagonal  when  the  eigenvalues  are  separated  by  large  relative 
gaps.  For  example,  for  the  eigenvalues  e  and  2e,  we  are  able  to  compute  orthogonal  eigen¬ 
vectors  without  resorting  to  Gram-Schmidt  (see  Chapter  4  for  more  details). 

However,  eigenvalues  1  +  y/e  and  1  +  2y/e  have  a  relative  gap  of  O(y^)  and  the 
methods  of  Chapter  4  can  only  guarantee  that  the  dot  product  of  the  approximate  eigen¬ 
vectors  is  0(y/e). 

In  this  section,  we  present  examples  to  demonstrate  ways  of  obtaining  orthogonal¬ 
ity  even  when  relative  gaps  appear  to  be  tiny.  Consider  the  matrix 

.520000005885958  .519230209355285 
_  .519230209355285  .589792290767499  .36719192898916 

°  .36719192898916  1.89020772569828  2.7632618547882 • 10“8 

2.7632618547882 • 10“8  1.00000002235174 

with  eigenvalues 

Ai  ~  £,  A2  ~  1  4"  yfi,  A3  ~  1  2-^/s,  A4  ~  2.0,  (C.0.1) 

where  e  ~  2.2  X  10-16  (all  our  results  are  in  IEEE  double  precision  arithmetic). 

Standard  methods  can  compute  approximate  eigenvectors  of  Ai  and  A4  that  are 
orthogonal  to  working  accuracy.  Algorithm  X  of  Section  4.4.3  computes  the  following 
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approximations  to  the  interior  eigenvectors 


.5000000027411224 

-.5000000074616407 

.4622227318424748 

-.4622227505556183 

,  V3  = 

-.1906571623774349 

.1906571293895208 

.7071067740172921 

.7071067673414712 

but  | vjvs  |  =  0(y/e).  By  standard  perturbation  theory,  the  subspace  spanned  by  t>2  and  V3 
is  not  very  sensitive  to  small  changes  in  To  and  thus,  V2  and  ^3  are  orthogonal  to  the 
eigenvectors  tq  and  tq.  However,  the  individual  eigenvectors  tq  and  are  more  sensitive 
and  the  0(y/s)  dot  products  are  a  consequence. 

In  spite  of  the  above  observation,  we  can  ask  ourselves  if  it  is  possible  to  obtain 
orthogonal  tq  and  ^3  without  resorting  to  Gram-Schmidt.  A  natural  approach  is  to  try  and 
extend  the  ideas  of  Chapter  4  which  enable  us  to  compute  orthogonal  eigenvectors  even 
when  eigenvalues  are  as  close  as  e  and  2e.  We  make  two  crucial  observations 

1.  Eigenvectors  are  shift  invariant,  i.e. , 

Eigenvectors  of  T  =  Eigenvectors  of  T  —  /. il ,  for  all  fi. 

2.  Relative  gaps  can  be  changed  by  shifting,  i.e., 

relgap(A„  Xt+1)  ^  relgap(A;  -  fi,  Xi+1  -  fi). 

For  example,  the  interior  eigenvalues  of  To  —  /  are  yfe  and  2y/e,  and  these  shifted 
eigenvalues  now  have  a  large  relative  gap!  As  we  did  for  the  positive  definite  case,  we 
can  find  bidiagonal  factors  of  the  indefinite  matrix  To  —  I.  However,  unlike  the  positive 
definite  case  there  is  no  guarantee  that  these  bidiagonal  factors  will  determine  their  small 
eigenvalues  to  high  relative  accuracy. 

A  distinguishing  feature  of  factoring  a  positive  definite  matrix  is  the  absence  of 
any  element  growth.  When  we  factor  an  indefinite  matrix,  the  near  singularity  of  principal 
submatrices  can  cause  large  element  growths.  However,  in  many  cases  we  will  be  able  to 
form  the  L ,  U  factors  of  a  tridiagonal  matrix  without  any  element  growth. 

When  we  proceed  to  find  the  decomposition 


To  —  i^I  —  LoDoLq  , 


(C.0.2) 
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where  p  =  1,  we  get  the  following  elements  on  the  diagonal  of  Dq  and  off-diagonal  of  Lq. 


diag(Do) 


-.4799999941140420 
.1514589857947483 
3.074504323352656- 10“7 
1.986821250068485- 10“8 


diag(L0,  -1) 


-1.081729616088125 

2.424365428451800 

.08987666186707277 


We  observe  that  there  is  no  element  growth  in  forming  this  decomposition.  In  particular, 

D0(i,i)  '1/2 


def 

Element  Growth  =  max 


T0(i,i)  -  p 


=  1. 


(C.0.3) 


Note  that  the  element  growth  always  equals  1  when  factoring  a  positive  dehnite  tri diagonal 
(see  Lemma  4.4.1). 

We  say  that  the  decomposition  C.0.2  is  a  new  representation  of  T  (based)  at 
p  =  1.  We  can  now  compute  the  interior  eigenvalues  of  LqDqL^  accurately  by  the  bisection 
algorithm  using  any  of  the  differential  qd-like  transformations  given  in  Section  4.4.1  as  the 
inner  loop.  These  “refined”  eigenvalues  are 


8 A2  =  1.4901161182018- 10“8  «  yfi,  and  8 A3  =  2.9802322498846 •  10“8  «  2^/s. 


The  relative  gap  between  8 A  2  and  8 A3  is  large  and  so  we  can  attempt  to  compute  the 
eigenvectors  of  LqDqL^  corresponding  to  these  eigenvalues.  By  employing  methods  similar 
to  those  given  in  Chapter  4  (more  specifically,  Step  (4d)  of  Algorithm  X),  we  find 


.4999999955491866 

.4999999997942006 

.4622227251939223 

.4622227434674882 

-.1906571596350473 

,  v3  = 

-.1906571264658161 

.7071067841882251 

-.7071067781848689 

Miraculously  the  above  vectors  are  orthogonal  to  working  accuracy  with  1 7)^" T3 1  =  2s!  As 
before,  82  and  D3  are  orthogonal  to  v\  and  tq.  On  further  reflection,  we  realize  that  the 
success  of  the  above  approach  is  due  to  the  relative  robustness  of  the  representation  obtained 
by  (C.0.2).  We  say  that  a  representation  is  relatively  robust  if  it  determines  all  its  eigenvalues 
to  high  relative  accuracy  with  respect  to  small  relative  changes  in  the  individual  entries  of 
the  representation. 

As  opposed  to  the  case  of  bidiagonal  Cholesky  factors,  there  is  no  existing  theory 
to  show  that  factors  of  an  indefinite  tri  diagonal  are  relatively  robust.  Indeed  bidiagonal 
factors  of  an  indefinite  T  —  pi  may  be  far  from  being  relatively  robust  (see  Example  5.2.3). 
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However,  we  suspect  a  strong  correlation  between  the  lack  of  element  growth  and  relative 
robustness.  We  now  present  some  evidence  in  support  of  this  conjecture. 

In  Section  5.2,  we  introduced  relative  condition  numbers  Krei(Xj )  for  each  eigen¬ 
value  A j  of  an  LDLt  representation,  These  condition  numbers  indicate  the  relative  change 
in  an  eigenvalue  due  to  small  relative  perturbations  in  entries  of  L  and  D.  We  define 

Krei(LDLT)  maxKre;(Aj),  (C.0.4) 

j 

Note  that  if  k(LDLt)  =  0(1)  then  LDLT  determines  all  its  eigenvalues  to  high  relative  ac¬ 
curacy.  In  some  cases,  an  LDLT  representation  may  determine  only  a  few  of  its  eigenvalues 
to  high  relative  accuracy. 

In  Figure  C.l,  we  have  plotted  the  element  growths  and  the  relative  condition 
numbers,  as  defined  by  (C.0.3))  and  (C.0.4)  (see  also  (5.2.8))  respectively,  for  representations 
of  To  (based)  at  various  shifts.  The  X-axis  plots  the  shifts  /u  of  (C.0.2),  while  the  Y-axis 
plots  the  element  growths  (represented  by  the  green  circles  o)  and  relative  condition  numbers 
(represented  by  the  red  pluses  +)  on  a  logarithmic  scale.  We  have  taken  extra  data  points 
near  Ritz  values  (eigenvalues  of  proper  leading  submatrices),  and  eigenvalues  of  To  since 
we  may  fear  large  element  growths  close  to  these  singularities.  The  eigenvalues  of  To  are 
indicated  by  the  solid  vertical  lines,  while  the  Ritz  values  are  indicated  by  the  dotted  vertical 
lines. 

In  Figure  C.l,  we  see  that  whenever  the  element  growth  is  0(1)  the  representa¬ 
tion  is  relatively  robust.  In  particular,  relatively  robust  representations  exist  near  all  the 
eigenvalues.  Earlier,  we  observed  no  element  growth  at  /u  =  1.  The  enlarged  picture  in 
Figure  C.2  shows  that  there  are  many  other  relatively  robust  representations  near  A2  and 
A3.  However,  there  are  large  element  growths  near  the  mid-point  of  these  eigenvalues.  From 
Figure  C.l,  we  also  observe  that  representations  based  or  anchored  near  the  Ritz  values  of 
To  are  not  relatively  robust. 

To  see  a  slightly  different  behavior,  we  examine  another  matrix  with  identical 
spectrum  (as  in  (C.0.1)), 

1.00000001117587  .707106781186547 
.707106781186547  .999999977648258  .707106781186546 

.707106781186546  1.00000003352761  1.05367121277235 • 10“8 

1.05367121277235- 10“8  1.00000002235174 

Here  also,  in  order  to  compute  orthogonal  approximations  to  t>2  and  we  can 
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Figure  C.l:  The  strong  correlation  between  element  growth  and  relative  robustness 


Shift 


Figure  C'.‘2:  Blow  up  of  earlier  figure  :  X-axis  ranges  from  1  +  10  8  to  1  +  3.5  X  10  8 


Element  growth  /  Relative  condition  number 
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Figure  C.3:  Relative  condition  numbers  (in  the  figure  to  the  right,  X-axis  ranges  from 
1  +  10-8  to  1  +  3.5  X  1(T8) 

attempt  to  form  the  representation 

t\  I  /(j/bj/.f.  (C.0.5) 

However, 


1.117587089538574- 10“8 

-4.473924266666669 • 107 

,  diag(Li,-l)  = 

6.327084877781628- 107 

-1.580506693551128- 10“8 

4.470348380358755- 10“8 

.2357022791274500 

1.986821493746602- 10“8 

and  since  Xf(2,2)  —  1  =  —2.24  •  10-8,  the  element  growth  is  ~  10s !  Due  to  this  large 
element  growth,  we  may  suspect  that  this  representation  is  not  relatively  robust.  Indeed, 
k(L\D\L^ )  ss  4.5  *  10'.  To  see  if  there  are  any  good  representations  near  1,  we  again  plot 
the  element  growths  and  relative  condition  numbers  on  the  Y-axis  against  the  shifts  on  the 
X-axis.  Figure  C.3  suggests  that  there  are  no  good  representations  near  A2  and  A3  (the 
reader  should  contrast  this  with  Figures  C.l  and  C.2,  especially  the  blow  ups). 

However,  there  is  no  cause  for  alarm.  Remember  that  we  are  interested  in  getting 
a  “good”  representation  near  A2  and  A3  in  order  to  compute  the  corresponding  eigenvectors. 
Figure  C.3  indicates  that  there  is  no  representation  at  these  shifts  that  is  relatively  robust 
for  all  eigenvalues.  However,  all  we  desire  of  this  representation  is  that  it  determine  its  two 
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smallest  eigenvalues  to  high  relative  accuracy.  Indeed,  we  discover  that  despite  the  large 
element  growths,  the  relative  condition  numbers  Krei(X 2)  and  Krei(\ 3)  for  the  representations 
close  to  A2  and  A3  are  0(1),  i.e. ,  the  representations  near  1  do  determine  their  two  smallest 
eigenvalues  to  high  relative  accuracy.  Thus,  just  as  before,  we  can  compute  f>2  and  ^3  from 
the  representation  based  at  1,  and  get 


.4999999981373546 

-.5000000018626457 

2.634178061370114- 10“9 

-1.317089024797209- 10“8 

,  v3  = 

-.4999999981373549 

.5000000018626453 

.7071067838207259 

.7071067785523688 

where  |h|’w3|  <  3e! 

We  have  observed  numerical  behavior  similar  to  To  and  T\  in  many  larger  matrices. 
The  more  common  case  shows  no  element  growth  for  shifts  near  the  eigenvalues,  whereas 
large  element  growths,  as  in  the  case  of  Ti,  occur  rarely.  The  study  of  relative  condition 
numbers  of  individual  eigenvalues,  especially  the  tiny  ones,  is  pursued  in  Chapter  5. 


